Since VXLAN was introduced in 2014 it has become an important component of modern data center network fabrics. This blog reviews what VXLAN is, why it was developed, how it is being used in data centers, and advantages over other virtualization technologies. In an upcoming blog, we will look at some innovative VXLAN applications outside the data center.
What is VXLAN?
Virtual eXtensible Local Area Network (VXLAN) is an Internet standard protocol that provides a means of encapsulating Ethernet (Layer 2) frames over an IP (Layer 3) network, a concept often referred to as “tunneling.” This allows devices and applications to communicate across a large physical network as if they were located on the same Ethernet Layer 2 network.
Tunneling approaches such as VXLAN provide an important tool to virtualize the physical network, often called the “underlay,” and allow for connectivity to be defined and managed as a set of virtual connections, called the “overlay.” These virtual connections can be created, modified and removed as needed without any change to the physical underlay network. (Mike Capuano’s blog, What to Know About Data Center Overlay Networks, provides a deeper dive on overlays.)
While VXLAN is only one of many virtual networking or tunneling technologies, it addresses several scaling challenges in data center networks better than alternative technologies, as we will discuss below. Because of these advantages, modern data center architectures for cloud computing now generally combine a “scale-out” IP (L3) leaf-spine underlay based on a robust routing protocol (such as BGP) with a VXLAN-based overlay, as shown in Figure 1.
Figure 1: Scalable Data Center Fabric Architecture with L3 Underlay and VXLAN Overlay
VXLAN Frame Format
Below is a simplified view of the VXLAN frame format.
|IP||UDP||VXLAN Header||Encapsulated Ethernet Frame|
Figure 2: Simplified VXLAN Frame Format
The VXLAN protocol encapsulates Ethernet frames in a VXLAN header that includes a VXLAN Network Identifier (VNI), a value that distinguishes each VXLAN tunnel (aka “network”). Because the VNI consists of 24 bits, the number of possible VNIs in a network is over 16 million. As we will discuss below, this offers an important scalability advantage over older Virtual Local Area Network (VLAN) technology. VXLAN frames are then encapsulated in UDP (User Datagram Protocol) over IP so they can be routed across a layer 3 network.
VXLAN Tunnel End Points (VTEPs)
The endpoints of the tunnel, where frames are encapsulated and decapsulated, are known as VXLAN Tunnel End Points (VTEPs). This encapsulation may be done on a server that hosts virtual machines (VMs) or containerized applications, or it may be implemented in a network processor in an Ethernet switch. Both server-based and switch-based VTEPs, are represented in Figure 2. In this figure, the VXLAN overlay fabric connects server- and switch-based VTEPs and stretches across two geographically separated data centers connected by a layer 3 routed wide area network (WAN) or other data center interconnect (DCI) transport.
Figure 3: VXLAN Tunnels and VTEPs in a Multi-site Data Center Fabric
In his overlay blog, Mike Capuano discussed the pros and cons of server-based vs. switch-based VTEPs. Server-based VTEPs can support more distributed overlay network services, such as fine-grained microsegmentation for security. However, server-based VTEPs that run in software use server CPU cycles that could otherwise be used by applications. VTEPs implemented in switches are typically hardware accelerated, so they eliminate that performance bottleneck.
Now we are seeing an emerging option that combines the benefits of hardware acceleration with highly distributed services: the data processing unit, or DPU, which is a network interface card (NIC) installed in the server that incorporates powerful data processing silicon to accelerate networking functions including the VXLAN overlay. (DPUs are also called “SmartNICs” but they generally have far greater processing power and functionality than earlier generations of SmartNICs.) We anticipate that many high-performance data center fabrics will incorporate a mix of hardware-accelerated VTEPs in both switches and DPUs, as shown in Figure 3.
VXLAN Goals and Advantages versus Alternatives
VXLAN is not the first or only network virtualization technique, but it offers substantial advantages over many alternatives for data center network virtualization. At a high level, the most important goals for VXLAN are:
- Underlay networks can be built using scalable layer 3 architectures with high resilience, high availability and predictable performance.
- Virtual connections can be defined in software, enabling software-defined network (SDN) automation to increase network operations agility and efficiency and reduce human error.
- Virtualization can be scaled to millions of tunnels and endpoints, enabling fine-grained, secure segmentation between applications and tenants.
When combined with an appropriate control plane technology and network automation framework, VXLAN meets those goals more effectively than many alternative approaches. Let’s review a few of the alternatives.
VXLAN vs. VLANs in Data Center Networks
Data center networks were traditionally built using Ethernet switches without any overlay protocol for virtualization. In this architecture, each switch acts as an Ethernet MAC bridge and implements the spanning tree protocol to avoid loops in the network. In the simplest implementation, all devices and VMs are connected to the same layer 2 broadcast domain. If segmentation or isolation of applications or tenants in these networks is needed, it is provided by Virtual LANs (VLANs), denoted by a 12-bit VLAN ID added to the Ethernet frame header (analogous to the VXLAN virtual network identifier). This extra header is sometimes called a “.1Q tag” alluding to the IEEE 802.1Q standard.
While this type of network works well enough for a single-tenant data center at a small scale, it has many drawbacks for larger scale data centers, especially multi-tenant data centers. With only 4000 unique VLAN IDs available, segmentation options are limited. Perhaps more importantly, the spanning tree protocol is poorly suited to scale-out data center fabrics, both because it makes inefficient use of redundant links and because it is far less resilient than layer 3 routing techniques. Large layer 2 networks are vulnerable to broadcast storms that can take down the entire network.
The VXLAN standard discusses these and other limitations in more detail, and details how using a VXLAN overlay with a layer 3 underlay addresses them.
VXLAN vs. Provider Bridges, VLAN Tag Stacking, Q-in-Q
One approach to avoiding the limit of ~4000 VLAN identifiers is to add a second VLAN tag, an approach referred to as tag stacking or “Q-in-Q” (due to the use of two “.1Q tags”) and covered in the IEEE Provider Bridges standard. The typical service provider use case envisioned for this approach allows the service provider to use the outer tag (S-tag) to provide segmentation or isolation between its customers or tenants while the inner tag (C-tag) is used by the customer, so each customer can use the full range of ~4000 VLANs without concern for what other customers are using.
|Destination MAC||Source MAC||Outer 802.1Q VLAN Tag||Inner 802.1Q VLAN Tag||Ethertype + Payload + CRC|
Figure 4: Simplified QinQ Frame Format
Using two tags allows for up to 16 million unique combinations (equivalent to VXLAN though somewhat less flexible than using a single 24-bit VNI), so that addresses VLAN scalability issues. It does not, however, address the inefficiency and poor resilience inherent in layer 2 networks.
VXLAN vs. TRILL and Shortest Path Bridging
Subsequent standards known as “Transparent Interconnection of Lots of Links (TRILL)” and “Shortest Path Bridging (SPB)” attempted to address the efficiency and resilience problems of spanning tree by borrowing from layer 3 link-state routing, specifically the widely used IS-IS routing protocol, which does not require an IP network. These are sometimes referred to as “MAC-in-MAC” approaches because a second Ethernet MAC address is added to the frame for forwarding between the TRILL-enabled or SPB-enabled bridges.
Both standards gained attention and they were often compared and contrasted, but neither achieved consensus. They also shared a significant drawback, which was the need for specialized hardware. Some implementations, such as Cisco’s FabricPath, also diverged from the standards, raising concerns about interoperability and vendor “lock in.”.
VXLAN vs. MPLS for Data Center Fabrics
MPLS Layer 2 VPNs (L2VPNs) provide layer 2 connections across a layer 3 network, but not just any layer 3 network. The routers in the network must all be IP/MPLS routers. Virtual networks are isolated using MPLS pseudowire encapsulation and MPLS labels can be stacked, analogous to VLAN tag stacking, to enable large number of virtual networks.
IP/MPLS is commonly used in telecom service provider networks, and as a result many service provider L2VPN services are implemented with MPLS. These include point to point L2VPNs, sometimes called pseudowires, and multipoint L2VPNs implemented according to the Virtual Private LAN Service (VPLS) standard. These services often conform to Metro Ethernet Forum (MEF) Carrier Ethernet service definitions for E-Line (point to point) and E-LAN (multipoint), respectively.
Because MPLS and its associated control plane protocols are designed for highly scalable layer 3 service provider networks, some data center operators have used MPLS L2VPNs in their data center networks to overcome the scaling and resilience limitations of layer 2 switched networks, as shown in Figure 4.
Figure 5: MPLS-based Data Center Fabric
This approach did not become widespread for several reasons.
- MPLS-capable routers tend to be more costly than non-MPLS routers, and much more costly than data center-class layer 3 switches. VXLAN support, including hardware-accelerated VTEP functionality, is now widely available in commodity switching silicon from Broadcom and others, and in the new class of server-based DPUs.
- MPLS-based VPN solutions require tight coupling between edge and core devices, so every node in the data center network must be MPLS-capable. By contrast, VXLAN only requires VTEPs in the edge nodes (e.g. a leaf switch or DPU) and the data center spines and data center interconnect (DCI) can be implemented using any IP-capable device or IP transport network.
- Outside of large service providers, MPLS is a niche technology with a steep learning curve, so relatively few network engineers are comfortable building and operating MPLS-based networks. VXLAN is relatively simpler and is becoming a foundational technology understood widely by data center network engineers.
Given its advantages, VXLAN is overwhelmingly preferred to MPLS in data center networks. (In fact, VXLAN is even proving to be a viable alternative to MPLS to provide Carrier Ethernet services in some service provider networks, a topic that we will explore more in an upcoming blog.)
VXLAN vs. Other Overlay Protocols
VXLAN was not the first attempt to define an overlay protocol capable of extending layer 2 services across pure layer 3 underlays.
- VXLAN vs. OTV: Overlay Transport Virtualization (OTV) is a Cisco proprietary approach with many similarities to VXLAN. OTV incorporates a control plane protocol to scale MAC address learning, reduce traffic flooding, and isolate layer 2 failure domains. Besides the significant downside of being a proprietary solution, OTV is also seen as having disadvantages relating to load balancing and convergence when compared to VXLAN with an SDN or BGP EVPN control plane (more on VXLAN control plane options below).
- VXLAN vs. NVGRE: NVGRE (Network Virtualization Using Generic Routing Encapsulation) builds on GRE, a long-standing encapsulation standard widely supported in routers. Like VXLAN it includes a 24-bit network identifier for up to 16 million subnets. VXLAN and NVGRE were introduced around the same time, but as VXLAN took off due to its simplicity, NVGRE was largely left behind.
- VXLAN vs. GENEVE: GENEVE (a portmanteau for Generic Network Virtualization Encapsulation) is a relatively new protocol that is meant to be a “unified” approach incorporating the flexibility of NVGRE while addressing some perceived limitations of VXLAN, including lack of a protocol identifier (for mixing Ethernet and non-Ethernet traffic in a tunnel), limited support for operation, administration and maintenance (OAM) packets, and no mechanism for standardized extensions. For most VXLAN use cases, especially in data center fabrics, these are not important limitations, so GENEVE adoption seems unlikely to surpass VXLAN any time soon, but it is gaining traction in some vendor solutions, such as VMware’s NSX-T.
Technically, VXLAN, NVGRE and GENEVE all provide very similar capabilities and they can all work with the same control planes, such as SDN or BGP EVPN, but so far VXLAN is far more widely implemented.
Summary of Virtualization Technologies
The table below summarizes many of the key points made above comparing VXLAN to alternative data center network virtualization technologies.
VXLAN Control Planes and Automation
In principle, VXLAN overlays can be manually configured with static MAC to VTEP IP address mapping, but in practice some type of control plane or automation framework is needed to achieve meaningful network scalability and agility. The VXLAN standard describes a data plane learning approach and also emphasizes that other control plane options are possible.
VXLAN Data Plane Learning with IP Multicast
The approach described in the VXLAN standard extends standard MAC address learning to create MAC to VTEP IP address mapping without fundamentally altering the way learning works. IP Multicast in the underlay is used to transmit layer 2 broadcast/unknown/multicast (BUM) traffic. This approach has some drawbacks relative to other approaches described below. First, it expands layer 2 broadcast and failure domains, rather than isolating them. Second, the use of IP Multicast tightly couples the underlay network to the overlay and increases management complexity, compared to a typical IP (unicast) network.
BGP EVPN Control Plane for VXLAN
BGP EVPN provides an increasingly popular, standards-based approach to create VXLAN overlay networks meeting several objectives:
- Removes MAC address learning from the data plane, containing layer 2 broadcast and failure domains.
- Reduces broadcast and multicast traffic load through selective forwarding.
- Enables optimal forwarding, load balancing and convergence across the underlay IP network.
BGP EVPN uses the BGP protocol running in each switch to communicate MAC address and other information among network nodes, so we refer to it as a “protocol-based” control plane.
Configuring BGP EVPN services on every switch in the network can be complex, time-consuming and error-prone, so some network operators look to network automation tools, which may include SDN automation, to reduce complexity and improve provisioning speed.
SDN Control Plane and Automation for VXLAN
SDN can also be used not just to automate BGP EVPN, but to provide a protocol-free alternative to BGP EVPN. An SDN control plane can achieve all the same objectives described above for BGP EVPN without requiring BGP EVPN to be configured on every switch. In an SDN-enabled VXLAN overlay, the SDN control plane takes care of MAC address learning and efficient forwarding, while also providing comprehensive end-to-end network automation, which can result in a network that is orders of magnitude simpler to configure and operate.
Figure 6 compares the operational complexity to configure a single service (in this case a Layer 3 virtual routing and forwarding, or VRF instance) across a 128-node VXLAN overlay fabric. Applying SDN automation to BGP EVPN configuration can result in an order of magnitude simplification versus manual configuration, while adopting a full SDN automation approach can simplify the task by roughly three orders of magnitude.
Figure 6: SDN Automation Benefits for VXLAN Overlay Provisioning
To learn more about VXLAN control plane options, see my blog, BGP EVPN for Scaling Data Center Fabrics, which describes how BGP EVPN works, compares the pros and cons of BGP EVPN and SDN control planes, and describes how they can be used together to meet a wide range of fabric scaling and interoperability scenarios.
VXLAN has become the most popular protocol for overlay network virtualization in data center fabrics due to its advantages over a long list of alternatives. When implemented in hardware-based VTEPs in switches and DPUs and combined with a BGP EVPN or SDN control plane and network automation, VXLAN-based overlay networks can provide the scalability, agility, high performance and resilience needed for distributed cloud networking into the foreseeable future.