Overlay Networks are not new by any means but are becoming more and more main-stream across multiple places-in-the-network (PINs). While not the primary subject of this blog, one form of overlay network that has become popular over the last few years is SD-WAN. SD-WAN focuses on branch and home office connectivity where IPSec tunnels are created over a layer 3 physical underlay network such as broadband cable, fiber-to-the-home or dedicated internet access circuits and then subsequently tunneled across an IP or MPLS backbone. This overlay approach facilitates a homogeneous network that can be automated via SDN over a heterogeneous underlay with improved scalability, application layer visibility, reliability and manageability.
We have seen a similar networking trend inside single data centers and across multi-site data centers where overlays have continued to gain in importance and traction to support the network scale and agility required by private cloud. In the first blog in this series Trends in Data Center Networking: Past to Future Jay Gill talked about the rise of private cloud and how the best practice for data center underlay networks, as pioneered by the hyper-scalers, is a leaf/spine fabric architecture with a Layer 3 BGP control plane. He also touched on data center overlays and this blog will dive deeper into that topic.
The data center overlay networks of today typically provide a fully meshed fabric of VXLAN tunnels across a Layer 3 IP underlay providing network virtualization – a homogeneous logical network defined in software and abstracted from the physical hardware of the underlay. This virtual network overlay can be deployed inside a single data center or across multiple data centers to provide dramatically increased scale and agility, including automation via software defined networking (SDN). With a logical virtualized overlay, network services can now be deployed in minutes instead of the days or weeks typically required when services are deployed from the underlay. The problem is that frequently messing with underlay configurations is not only time consuming and slow, but can be precarious, with the risk of creating performance issues or even network outages due to human error.
The bottom line: the data center network overlay enables the network operations (NetOps) team to move at the speed of cloud by delivering reliable and high-performance Layer 2 and Layer 3 network services in minutes to support their internal developer operations (DevOps) customers while maintaining a resilient data center architecture with robust uptime.
What Is A Data Center Overlay Network and Network Virtualization?
Data center overlays are a bit more complex than the SD-WAN example given earlier because they need to support sophisticated layer 2 as well as layer 3 network services. Layer 2 is needed across multiple racks and also across geographically distributed DC sites because there are numerous use cases where a layer 2 adjacency is needed. For example, it makes it easier for workload mobility (e.g. vMotion) where the IP address of the workload stays consistent and keeps the same default gateway as it moves from site to site. Stretched Layer 2 is also the best solution for hyperconverged infrastructure (HCI) which often relies on the vSAN protocol which performs best over layer 2 and there are numerous other examples. We’ve already discussed in the first blog that a layer 3 underlay has been proven to be the most scalable and that a layer 2 underlay topology is non-tenable in a single DC and it only gets worse across multiple DCS. Thus, an overlay running over a layer 3 underlay is the best way to support a combination of layer 2 and layer 3 services across multiple sites.
Data center overlay networks support virtualization by creating a logical network defined in software using tunneling protocols like VXLAN or GENEVE. Since VXLAN is the predominantly used overlay data plane tunneling protocol in data center networks, let’s focus on that. VXLAN is designed to tunnel over an IP based network underlay and transport various layer 2 and layer 3 network services. As described in the previous blog, the industry best practice is a (3-stage) Clos fabric underlay using layer 3 BGP with BGP unnumbered which provides physical connectivity with non-blocking performance, ECPM-based multi-pathing for load balancing, high availability and single-hop predictable latency between any two leaf switches to ensure consistent performance.
Once the underlay is established the VXLAN tunnels are typically set up on top of this underlay in a mesh from every leaf switch to every other leaf switch to create an overlay fabric, also providing a single hop in the logical network among all the leaf switches. This approach offers a major simplification in provisioning network services for East-West traffic within the private cloud, because these services have to be configured only on the leaf switches independently from the rest of the underlay network (the two leaf switches could literally be on either side of the world). The VXLAN tunnels are initiated and terminated on the leaf switches, acting as tunnel termination endpoints (also referred to as Virtual Tunnel End Points or VTEPs), where the network traffic is encapsulated or decapsulated at wirespeed by the switching chip. These VTEPs can also be deployed on each and every server as an alternative approach which I refer to as “compute-based” and I’ll talk more about that later in the blog.
The overlay then becomes the network service delivery mechanism and since it is defined and managed in software it is highly agile and, if implemented properly, can deliver consistent services across the entire fabric in minutes if not seconds. One important benefit of this approach is that it allows the underlay to be simply configured with a focus on scale and stability – the lack of having to make numerous configuration changes to the underlay avoids human errors and increases network performance and uptime.
The capabilities of overlays have grown substantially over time and they can now provide a full suite of Layer 2 and Layer 3 network services. For Layer 2 this still includes the very valuable feature of stretching VLANs for workload mobility and for supporting protocols like vSAN used for storage and hyper-converged infrastructure (HCI) deployments. However, it now also includes more sophisticated services such as Q-in-Q and Bridge domains to support more complex architectures and scale multi-tenant environments beyond the 4,096 traditional VLANs. Layer 3 services leverage distributed routing with anycast gateways which mean that routing happens locally on every leaf switch, avoiding inefficient hair pinning. The layer 3 services include distributed VRFs for IPv4/IPv6 unicast and multicast routing, policy-based routing, inter-subnet route leaking and other advanced layer 3 services. Overlays also support network segmentation, load balancing and security policies like ACLs and much more. The power of the overlay fabric is that, depending on the implementation, it can offer a central policy engine or SDN control capability and with one or two commands deploy a layer 2 or layer 3 service across the entire fabric speeding the time to deploy services, dramatically trimming the amount of configuration work increasing agility and reducing human error.
Approaches to Data Center Overlay Network?
Overlay Networks can be instantiated in many different ways. As discussed above there are switch-based overlays and compute-based overlays (also known as host-based overlays) as two broad categories, each with their advantages and disadvantages.
Compute-based overlays (Host-based Overlay)
One of the main advantages of the compute-based overlays is that they can perform very fine grained microsegmenation including between VMs in a single host with no need to travel to the top of rack switch.
However, they have a number of disadvantages. First, they can be very expensive because the typical licensing model requires a multi-thousand dollar license per-processor on every server, along with multiple additional external hardware elements and software licenses for gateways and controllers. Secondly, some of these solutions cannot aggregate devices that are non-virtualized including bare metal servers, IoT gateways and access and aggregation switches that need to feed into a segmented data center. Thirdly, this is an overlay-only solution and it is up to the NetOps team to figure out how to integrate this with the undelay — often a swivel chair exercise with two different management consoles. Finally, compute-based overlays typically consume significant CPU cycles on the host to process packets, stealing those cycles from the applications that the network is built to support.
This later issue is one of the reasons for the rise of SmartNICs containing dedicated packet processing and CPUs in order to offload the host and solve this problem. SmartNICs have implications both for compute-based and switch-based overlays and will be covered in a future blog.
Compute-based overlays all offer software defined networking (SDN) control planes.
Advantages of switch-based overlays are that they require many fewer licenses – there are typically 20 or more servers or so per switch – and thus this approach is typically much more cost effective. Furthermore, the VTEPs are hardware accelerated by the dedicated packet processor in the switch and therefore can operate at wirespeed with absolutely no tax on the host CPU. The switches can aggregate any device, whether it is virtualized or not, into the overlay enabling, for example, the ability to aggregate IoT data from non-virtualized IoT gateways into a separate segment so it is isolated from corporate data, thus reducing the attack surface for the enterprise. Finally, these switch-based solutions unify the management of underlay and overlay networks enabling the solution to work right out of the box and to present a single management console can be used to automate management of the undelay and overlay network. This is typically dependent on the control plane approach and whether or not SDN is used for the control plane, the topic of the next section.
Tunnels terminate on switch
|OS10, Pluribus,IP Infusion, SONIC||Cisco NXOS, Juniper, Arista, Cumulus/ Mellanox/Nvidia||Pro: No per-CPU or per-host licenses, hardware accelerated VTEPs, no CPU tax on host, can aggregate non-virtualized devices into overlay, integrated underlay and overlay for single pane of glass
Con: No segmentation for VMs residing on same host or traffic must travel to leaf switch at top of rack
Tunnels terminate on host
|VMware NSX, Juniper Contrail, Nokia Nuage||Pro: Network segmentation for VMs on same host
Con: Per-host licensing, difficult/expensive to aggregate non-server based devices into network segments (e.g. IoT GW), significant integration complexity, requires separate management of underlay, CPU tax
The one disadvantage for many switch-based overlays is the inability to provide segmentation for VMs running on the same host. Switch-based overlays can provide a comprehensive segmentation for the entire datacenter including microsegmentation but if two VMs or containers are o the same host then to move traffic into a different segment or keep traffic in the same segment requires the traffic to travel to the leaf switch at the top of rack.
Switch-based overlay control planes can be protocol based leveraging BGP EVPN or offer an SDN control plane, two very different approaches.
Understanding Control Plane Implications for Data Center Overlay Networks
The term control plane refers to how messages are exchanged among networking devices to determine where endpoints (e.g. containers, VMs, bare metal databases) are located, and define the set of policies to forward and secure the overlay communications among these endpoints. Control planes achieve this by populating forwarding tables at layer 2 (MAC address) or layer 3 (IPv4/IPv6 addresses) at each hop of the network (in each switch or router or compute node endpoint in some cases). An example of an underlay IP control plane is the IETF OSPF protocol or BGP protocol. As mentioned above, all compute-based overlays offer an SDN control plane. However, switch-based implementations have taken two different approaches here – BGP EVPN and SDN.
BGP EVPN is a standard, multi-vendor protocol-based overlay control plane which leverages the well proven internet scale BGP protocol — virtually every networking vendor supports BGP EVPN as the control plane to manage the overlay fabric and its associated network services. The challenge with BGP EVPN though is that it is a very heavy protocol in terms of configuration requirements. With BGP EVPN there are numerous commands that must be programmed into each switch to deploy a single service. In the figure below you can see that to deploy a single VRF for a new tenant in a 32 switch fabric requires 832 commands. For a 256 switch fabric this would be over 5000 commands. There are tools like Python and Ansible where scripting can help automate some of this, but it is still extremely complex and the NetOps team still needs to know BGP EVPN fairly intimately. Alternately, third-party automation solutions can be purchased and deployed, but this brings in additional expense and those point-in-time scripts and third-party solutions will always fall out of synch as the vendor providing the networking solution continues to enhance their solution and release new versions of code. Here again, if the third-party tool is not supporting a particular feature the NetOps team will need to know BGP EVPN fairly intimately in order to configure new features on each switch directly.
As you can see in the figure above, there is another approach which uses an SDN control plane for integrated automation of the underlay and the overlay. This approach has the benefit of architecting the control plane into the networking solution to radically simplify deploying and operating the data center fabric without the complexity of traditional networking protocols. In the case of the Pluribus Adaptive Cloud Fabric SDN control plane, the operator only deals with logical network objects (e.g. VLANs, VRFs, subnets etc.) and uses a simple and intuitive API to create, modify and delete these objects with a single command across all the nodes of the fabric. No protocol configuration is required by the end user. For example, the steps to deploy a VRF for a customer across an entire 32 switch fabric requires that 2 CLI commands are issued onto any single switch in the fabric with qualifier “scope fabric” and then SDN control plane of the fabric takes care of configuring all of the switches. These 2 commands can be issued via CLI, programmed via a REST API for an infrastructure as code approach or executed through the Pluribus UNUM graphical user interface. For the 256-switch fabric requiring over 5000 commands in BGP EVPN for a new service, Pluribus still only requires the same 2 commands for the entire fabric!
Since this automation is built into network operating system software running on the switches, the SDN overlay automation is fully integrated with the underlying features of the network operating system that provides both the underlay and overlay functionalities right out of the box. This makes the solution easy to deploy on Day 0 and the NetOps team does not need to know every knob in BGP EVPN to support a deployment. An SDN approach also integrates the automation of both of the underlay and the overlay, further simplifying network operations for Day 1, 2 and N. This of course means the NetOps team can spend their time delivering services at the speed of cloud and focusing their time on strategic initiatives to align with the business instead of configuring tens or even hundreds of switches in their data centers.
|Switch-based Overlay, BGP/EVPN Control Plane
Tunnels terminate on switch
|OS10, IP Infusion, SONIC||Cisco NXOS, Juniper, Arista, Cumulus/ Mellanox/Nvidia||Pro: No need to pay for SDN functionality
Con: Requires well-resourced IT team for box-by-box configs, scripting, programming (opex). May incur expenses for external automation solutions which can struggle to stay synched with each vendor’s solution.
|Switch-based Overlay, SDN Control Plane
Tunnels terminate on switch
|Pluribus Networks (controllerless)||Cisco ACI (controller-based)||Pro: Radically simplifies NetOps, integrates underlay and overlay, works out of box with little integration
Con: SDN control plane is not free. Controller-based solutions can be very expensive due to hardware dependencies & external controller costs.
Because Pluribus is focused on full standards compliance we support automated BGP EVPN to interoperate with third party fabrics with EVPN control planes. Inside our fabric we use SDN to automate the underlay and overlay but outside the fabric we offer standard protocols in the underlay and overlay to interoperate with any third party networking solutions to ensure easy insertion into existing network infrastructure. You can read more about Pluribus EVPN in this blog.
Overlays across Geo-distributed data centers
Overlays can be extremely powerful to stretch private cloud services across geo-distributed sites. This can support a disaster recovery scenario where one can have a prime data center and then a hot standby data center which requires the NetOps team to only change the advertised IP address to the firewall in the standby data center to quickly fail over. No other network configuration needs to be done and no other IP addresses need to be changed, so this can be done in minutes dramatically speeding up failover. Alternately Active-Active data center architectures can be deployed to deliver instant failover as well as data center resource sharing to optimize infrastructure costs. Furthermore, with the move to deploy data centers into more distributed colocation facilities and into smaller edge data center sites, an overlay fabric can facilitate workload mobility for migration purpose or to improve end-user experience or to handle spikes in demand.
However, extending switch-based overlays using a BGP EVPN control plane across geographically separated data centers can be very complex. SDN can simplify the deployment and management of overlays across sites but the cost of the SDN implementation must be examined carefully. If it is a controller-based SDN solution then that will typically require three SDN controllers at every site as well as three multi-site directors (controller of controllers) to stitch together the geo-distributed overlay. This can be complex, costly and consume unnecessary space and power which becomes more significant as the data centers sites become smaller and more space and power constrained. A controllerless, distributed SDN approach is much more cost effective and elegant when it comes to stretching the overlay across 2 or more data centers. You can read a bit more in this blog Perspective: Controller vs Controllerless SDN Solutions.
Conclusion & Takeaways
In the first blog in this series we talked about the key trends in data center networking at a high level including the move from three tier to leaf and spine and from a layer 2 to a layer 3 underlay as well as the rise of overlays. This blog, in turn, drilled down into data center overlay networks and how they can automate, virtualize and make data center networks much more agile inside a single data center or across multiple data centers. Ultimately the decision on how to deploy the overlay is multi-variable and there are numerous solutions: Compute-based or switch-based? BGP EVPN or SDN control plane? Controller-based SDN or controllerless SDN? If the right choices are made it is possible to radically simplify data center network operations, improve security, accommodate shifts in demand, make the on-prem data center more resilient and most importantly, increase service delivery agility and speed to deliver services to the DevOps team.
In the next blog Understanding Various Approaches to Data Center Network Automation I will take a deeper dive into some popular automation approaches including Linux-based scripting tools, automation frameworks such as Ansible and external automation solutions.
If you would like to learn more about Pluribus Networks Netvisor ONE operating system for disaggregated bare metal switches and our Adaptive Cloud Fabric distributed, controllerless SDN solution to automate underlay and overlay networks you can request a demo here and we’ll get you set up.
Also, you might want to consider attending our April 6th 2021 Webinar hosted at 8:00 AM PDT “Data Center Architectures for Amazing Application Availability” co-presented with Dell Networks.