“Reality is one, though wise men speak of it variously.”
— The Rigveda, ~1500 BCE
Listening to smart people discussing “Data Center Interconnect” (DCI), I am reminded of the parable of the blind men and the elephant, in which each person feels a different part of the elephant (trunk, tusk, body, leg) and arrives at a very different understanding of what an elephant is like (a snake, a sword, a wall or a tree).
What is Data Center Interconnect?
At first glance, the phrase “Data Center Interconnect” appears self-explanatory – two or more data centers interconnected by some type of network – but that simplicity hides a lot. Let’s break it down by examining different parts of the data center interconnectivity elephant, starting with optical transport.
DCI transport technologies and services
At the lowest level of any data center interconnect link is some type of optical transmission. If the bandwidth required between data centers is very low, say up to 1 gigabit per second (Gbps), DCI traffic might be carried on a shared packet network such as an MPLS VPN, which in turn shares optical transmission capacity on a service provider’s network. Capacities in the range of a few Gbps up to 100 Gbps may be most economically provided by dedicated transport services such as an Ethernet private line (EPL) or wavelength service.
When dark fiber is cheap and distances are very short (< 40 km), it may be more economical to lease a fiber pair for each 10G or 100G link and use pluggable long-reach (“LR”) or extended reach (“ER”) optical transceivers that fit directly in a router or switch SFP+ or QSFP port. (Also note that new pluggable DWDM technology is emerging that will enable both greater bandwidth and distance, and blur the lines with optical DCI systems – more on that below.)
As DCI bandwidth grows toward multiples of 10G, and definitely at multiples of 100G, it becomes increasingly attractive to employ an optical DCI system.
To an optical engineer, “DCI” means an optical transport application often provided by specialized dense wavelength division multiplexing (DWDM)-based DCI transport appliances. A typical Optical DCI appliance provides a simple muxponder function (digital multiplexer + coherent optical transponder), carrying Ethernet traffic transparently over optical wavelengths. Multiple 10 Gigabit Ethernet (GbE) or 100 GbE client ports may be mapped into a DWDM wavelength with 200G, 400G or even higher capacity, as shown below.
Note that the focus of the Optical DCI system is simply transporting Ethernet bits without regard to the architecture of the data center networks or the protocols in use.
For very high capacity DCI links, several wavelengths may be optically multiplexed on a single fiber pair. A single Optical DCI appliance may support multiple transponders and wavelengths for a total capacity of multiple Terabits per second (Tbps), and more wavelengths can be combined using a DWDM optical multiplexer, as shown below. Those wavelengths may all be transmitted point-to-point between the same two data center sites, or optically routed separately to different destinations using an optical line system (OLS) with more complex optical devices such as Reconfigurable Optical Add-drop Multiplexers (ROADMs).
The power and simplicity of these systems originally designed for data center interconnect has led to them being deployed in applications far beyond DCI and such “compact modular optical systems” are now the fastest growing market segment of optical transport. The most advanced coherent optical systems entering the market today are capable of transmitting as much as 30-40 Tbps on a single fiber pair over at least 80-100 km using the traditional C-band of optical frequencies, and double that capacity by also using the adjacent L-band. The capacity drops as distance grows, but even on subsea links between continents, capacities of over 20 Tbps per fiber pair are now achievable. Of course, only the largest hyperscale data center operators and service providers need anywhere near that much capacity on a single link, but the same advanced optical technology benefits DCI users at much smaller scale.
Pluggable DWDM Optics for DCI
The next evolution of DWDM for DCI and other applications is integration of coherent optical engines into the form factor of a QSFP or OSFP pluggable module that can plug directly into a router or switch port. Each module will produce a tunable, coherent DWDM wavelength capable of being transported at least 80-100 km, possibly much more. 100G and 400G versions of these modules are in development and field trials today, which means for the first time ever coherent optical engines will be integrated into switches & routers without sacrificing any port density or capacity when compared to non-coherent pluggable optics. This will make the long-promised vision of “IP + Optical” integration much more of a reality than ever before and will soon change the way optical DCI is deployed for many DCI applications. With pluggable DWDM, transponder/muxponder appliances will no longer be needed and only an external optical mux/line system will be required to carry multiple wavelengths on a single fiber. Compact modular optical systems will continue to offer advantages for many applications including longer range DCI, but they will likely concede a substantial fraction of the metro DCI market to pluggable DWDM.
Data Center Interconnect Applications
Whatever type of transport is chosen to enable data center interconnect, the next question is how DCI is used to link data center networks? Once again, there are many different ways to look at the elephant.
DCI for Peering and Cloud Interconnection
DCI systems are increasingly being used to enable remote peering and cloud interconnection. High-performance hybrid multi-cloud architectures demand direct, high-capacity connectivity between private cloud infrastructures and providers of public clouds and digital services. However, those service providers are not always located within the same building, so high capacity DCI links may be needed to connect to the sites where they are located. Some enterprises build their own DCI links for this purpose. Operators of colocation facilities or multi-tenant data centers (MTDCs) who have multiple facilities in a region are increasingly solving this interconnection problem for their tenants with software-defined interconnection services that provide transparent layer 2 (L2) interconnection between tenants in different data centers. The figure below illustrates how the underlying DCI transport links support a software-defined interconnection fabric that is logically isolated from the tenant data center networks, and this allows the tenants to peer directly at layer 3 (L3) with cloud providers in remote data centers.
DCI for Business Continuity & Disaster Recovery
Many business continuity and disaster recovery (BCDR) strategies are built around active-standby approaches where DCI is used mainly for data backup, replication and/or mirroring. If the primary data center site goes down, data that has been replicated to the standby site can be recovered and applications can be restarted within a (hopefully) short outage window. The data center networks themselves are essentially isolated and independent and the standby data center is underutilized during normal operations. While such strategies may still make sense for some data center operators, others are looking to use their multiple data center sites in a more unified way to improve their BCDR approach, increase application availability and simplify operations.
Moving Beyond DCI to Data Center Unification
The above applications for DCI are certainly valuable but they leave each data center network isolated, operating as a stand-alone entity. How can we build on the connectivity provided by DCI to unify data centers into a seamless whole?
The ideal of a multi-site data center fabric is to completely eliminate the boundaries between data centers and the resources that they house. Workloads can move among any sites in the fabric for capacity or availability reasons with no IP readdressing needed, and continue to be accessed from other sites as if they were local. Compute and storage resource utilization can be optimized across all sites. Active-active strategies can be implemented to increase application availability and simplify planned maintenance and failure recovery.
Overlay Technologies applied to DCI
A key building block of data center unification is the ability to extend layer 2 subnets (including VLANs and multipoint bridge domains) between leaf-spine pods in geographically separated data centers.
Why is stretching a subnet important? One reason is pervasive use of broadcast and multicast for things like service discovery. Another is the simplicity of moving things – such as VMs and their IP addresses – within a subnet vs. between subnets. Some devices and applications – both physical and virtual, legacy and modern – actually require layer 2 adjacency to work at all.
But stretching a subnet the wrong way can create havoc with things like traffic flooding across a large layer 2 broadcast domain, bandwidth inefficiencies inherent in technologies such as spanning tree, asymmetric traffic patterns, and creating a single failure domain across both data center sites. Numerous approaches have been tried in the past and found lacking.
Today it is commonly agreed that the right approach is to build separate underlay leaf-spine network domains in each data center, using a robust underlay layer 3 protocol such as BGP, and then use overlay encapsulation to stretch L2 and L3 services between them, across the underlying DCI links.
An early approach to overlay networking, pioneered by Cisco, was called Overlay Transport Virtualization (OTV), but Cisco has recently discarded OTV to move on to multi-POD ACI, shown below, which supports the idea of a unified data center fabric with layer 2 extension for active-active deployments, and multi-site ACI, which isolates data centers into different “availability zones.” Both of these architectures use a control plane based on BGP EVPN to create overlay networks of VXLAN tunnels across multiple sites, as do other vendors such as Cumulus Networks (more on VXLAN and BGP EVPN below).
Multi-POD ACI (source: Cisco)
As you can see from the Multi-Pod ACI diagram, the DCI connections are assumed to terminate on the spine switches and may use any type of “Inter-Pod Network (IPN)” providing IP connectivity.
An alternative topological approach, which reduces the complexity of spine switch configuration by removing the need for spine switches to participate in overlay service configuration, is to connect DCI to already existing border leaf nodes as shown below.
The Pluribus Adaptive Cloud Fabric can unify multi-site data centers in either topology, or in fact with any arbitrary topology (rings, partial meshes, etc.). In the simple case shown below, a Pluribus Multi-site Data Center Fabric logically unifies all of the leaf switches across two data centers with the Pluribus Netvisor ONE Network Operating System (NOS) and Adaptive Cloud Fabric software running on each leaf switch and creating a distributed, controllerless SDN control plane for provisioning overlay services using VXLAN tunnels. Spine switches may be part of the fabric but are not required to be.
In this diagram, it’s clear that the DCI link serves merely as an underlay network supporting the multi-site fabric. The only requirement of the DCI link is that it enables IP connectivity with sufficient performance, whether that is provided by a high-capacity optical DCI system, an Ethernet private line or any other type of wide-area network (WAN).
Comparing Overlay Fabric Approaches
On the surface, a Pluribus Multi-site Data Center Fabric looks similar to the other approaches discussed above, including Cisco multi-POD ACI. All of them use VXLAN encapsulation to create overlay tunnels that enable L2 and L3 services to stretch across the DCI links and the data center leaf-spine networks. All can therefore provide effective network virtualization that decouples the overlay network control plane from the underlay leaf-spine networks and DCI or WAN links.
But when you look at the control and management planes, the Adaptive Cloud Fabric offers distinct advantages over ACI and similar approaches based on EVPN.
- Controllerless: The Adaptive Cloud Fabric runs as a distributed application in every switch in the fabric, so it is inherently resilient and there is no need for external controller clusters, deployed on multiple redundant servers at every site, as required by other vendors. In the multi-POD ACI figure above, a single “APIC cluster” of controllers is split across two sites. This makes the controller cluster vulnerable to latency, so the inter-DC latency must be kept under 50 msec. It also makes the cluster vulnerable to loss of connectivity in the DCI or WAN link, creating potential single points of failure. Multi-site ACI reduces these APIC cluster dependencies but the trade-off is that it does not provide a truly unified multi-site DC fabric. A Pluribus multi-site fabric does not force any such trade-off.
- Simple: The distributed SDN control plane of the Adaptive Cloud Fabric and built-in automation features such as auto-VXLAN tunnel creation simplify fabric-wide service and policy configuration. With ACI and other approaches based on EVPN, each node in the fabric must be configured separately for every service. For example, in a 32-node network, EVPN requires 640 lines of code per service while the Adaptive Cloud Fabric can be configured with three simple commands.
The bottom line is that the Pluribus Adaptive Cloud Fabric offers a distinctly superior solution for unifying multi-site data centers over any type of DCI network.
Evolving from DCI to Multi-site DC Unification
For some customers, the first step toward unifying multiple data centers is deploying a Pluribus multi-site fabric just for interconnection, leaving the existing leaf-spine networks unchanged as shown below.
In this approach, Pluribus-enabled border leaf clusters are added to each leaf-spine network. This type of “brownfield” network insertion is possible because Pluribus software implements standards-based L2 and L3 protocols for full interoperability with other vendors. The border leaf switches in each data center are then joined into a single “Data Center Interconnect fabric” using the Adaptive Cloud Fabric. This allows simplified provisioning of layer 2 network extensions between data centers over any underlying DCI or WAN technology and any arbitrary topology. Once this DCI fabric is in place, it can be expanded over time to incorporate the other leaf switches in order to provide an end-to-end fabric for true multi-site unification.
This DCI Fabric approach looks similar to the multi-tenant software-defined interconnection fabric described earlier because in both cases the fabric is solely used for data center interconnection and stops at the “edge” of each data center leaf-spine network. However, the two use cases are quite different from a data center networking perspective. In the multi-tenant interconnection case, the fabric is operated by a completely separate entity from the tenant networks that use it, and those tenants connect to the fabric via a data center gateway router to access clouds and external networks for north-south traffic. By contrast, the DCI fabric shown above links border leaf nodes that are integrated into each leaf-spine network in order to carry east-west traffic for a single enterprise or service provider who operates the entire multi-site DC network.
Data center interconnect is clearly many things to many people, so it is important to “see the whole elephant” and understand how DCI fits into a broader data center networking context. For operators of multi-site data centers who are ready to move beyond DCI to unify their networks in a multi-site data center fabric, Pluribus offers a truly unique and innovative solution with numerous benefits when compared to alternative approaches, from easy brownfield insertion and migration to dramatically simplified and automated network operations.