Published by Jay Gill

Data center networking has evolved substantially over the past two decades, driven by the move to virtualized applications and cloud computing. Hyperscale cloud providers pursued radical innovation in network architectures and operations to scale efficiently and increase agility, and many of their ideas have filtered into the mainstream. Today every data center operator, from software-as-a-service (SaaS) and managed service providers to enterprises operating private clouds, strives for a similar level of agility and efficiency.

Why Data Center Networks Still Matter for Private Clouds

The current consensus is that enterprises are embracing hybrid multi-cloud strategies and running workloads in multiple clouds, choosing the right cloud to match the requirements of each application. While public clouds are optimal for many workloads, analysts project that the majority of workloads will continue to run in private cloud infrastructure. Some of that infrastructure will be in enterprise-owned (“on premise”) data centers, while a growing share of private cloud infrastructure will be in colocation or edge data centers, or managed by a hosted cloud service provider.

That means enterprises and managed service providers need data center networks that enable their private clouds to operate with the agility and efficiency of a public cloud. Application developers – the customers of the network operations team – are increasingly using agile “DevOps” approaches whether deploying in a public cloud or a private cloud. They need virtual machines or containers for applications to be spun up or moved anywhere within the private cloud in seconds or minutes, including all required networking services, so they need the network to adapt at the speed of cloud applications, bringing the DevOps mindset to network operations. Private cloud data center fabrics must also enable high application availability and performance, and stretch across multiple data centers and edge computing sites.

Building Data Center Networks for Private Clouds

A few areas of consensus have developed about the best way to build data center networks for clouds:

However, there are many areas where consensus is still lacking and network architects are faced with multiple options and trade-offs. Questions they face include:

After a brief review of data center networking basics and recent evolution, we will address the first of those questions. Upcoming blogs will tackle the rest.

Data Center Networking from Past to Present
To know where data center networking is going, it’s useful to understand where it has been.

What is Data Center Networking?
Data centers exist to enable software applications that support a digital business. The key elements of data center infrastructure are servers, storage and networking. Applications need servers for computing power and storage devices for data, and they need networks to connect to users and to other applications. Hence the simplest definition of data center infrastructure:

Data Center Infrastructure = computing + storage + networking

Going back several years, as rackable servers replaced larger mainframe and mini-computers, a hierarchical three-tier data center network architecture evolved (see Figure 1). Each server in a rack connects via Ethernet cables to an access switch, usually in the same rack and sometimes called a “top of rack” or TOR switch. For high availability, servers may have two network interface cards (NICs) connecting to two redundant access switches, sometimes referred to as a switch cluster. The access switches connected to aggregation switches and they in turn connected to the data center core switches, which connected to users outside the data center via enterprise wide area networks (WANs) and/or the public Internet.

(Note that Figure 1 focuses on the internal data center network architecture and does not depict all of the networking equipment that may be deployed at the edge of the data center, where North-South traffic enters and leaves to the outside world. Typical data center edge equipment includes edge or “gateway” routers for connection to the public Internet, and network security devices such as firewalls to block unwanted traffic. Additional devices intended to optimize application performance, such as load balancers and application delivery controllers may also be deployed at the data center edge.)

Figure 1: Traditional Three-tier Hierarchical DC Network

In the three-tier network architecture, the aggregation and core switches were typically expensive chassis-based switches with multiple line cards. Traffic was assumed to flow with reasonable predictability between users connected to the networks at the top of the diagram (aka “North”) and applications at the bottom (aka “South”), which is why we say this architecture was optimized for North-South traffic. The expensive aggregation and core layers were often heavily over-subscribed to control costs, with far fewer links and switch ports than would be required if all servers were simultaneously transmitting at the full rate of their network uplinks.

North-South versus East-West Data Center Network Traffic

As we have discussed extensively in a previous blog this architecture has many drawbacks that make it unsuitable to current cloud-driven data center requirements. Server virtualization and containerization has given rise to a much more dynamic environment where applications are no longer monolithic nor tied to specific servers. Applications can be componentized, with parts of the application running in different servers or even different data center locations. Workloads can be moved or scaled on demand, and developers can architect their applications for rapid iteration and deployment. 

One result of this trend is a dramatic increase in traffic between servers in the same data center. Visualizing such traffic as connecting the servers on the right (aka “East”) and left (aka “West”) of a data center diagram, we refer to this traffic as East-West. Data center network architecture needed a fundamental change to deal with the rise in East-West traffic.

Leaf-Spine Data Center Fabrics

The answer was Clos-based leaf-and-spine architectures such as the one shown in Figure 2. This architecture is often referred to as a data center fabric because of the rich interconnection links. The top-of-rack switches, now referred to as leaf switches, are redundantly connected to multiple spine switches with sufficient uplink capacity to create a non-blocking architecture. (In practice, many leaf-spine networks are engineered with a small amount of over-subscription, for example 1.2:1.) As a result, any leaf can reliably connect to any other leaf with a single hop and predictable latency, and without concern for congestion and packet loss. This enables greater capacity for East-West traffic and reduces or eliminates the need for link-by-link capacity engineering within the data center fabric.

Figure 2: Clos-based Leaf and Spine Data Center Fabric

Non-blocking leaf-spine architectures not only reduced the challenge of traffic engineering and congestion management, they enabled several powerful new ideas:

The result is a widely accepted framework combining a scale-out L3 leaf-spine underlay with a VXLAN overlay, as depicted in Figure 3.

Figure 3: Scalable Data Center Fabric Architecture with L3 Underlay and VXLAN Overlay

Data Center Networking Challenges: Getting from the Present to the Future

While the consensus is clear that data center networks for private clouds should be built with scale-out leaf-spine architectures, robust L3 underlays and some form of overlay network virtualization, there is less consensus about several other choices facing data center network architects, as noted in the introduction. 

In this blog we will address the first of the open questions and challenges facing network architects: when and how to move to disaggregated networking.

What is Disaggregated Networking and Why is it More Cost Effective?

Disaggregated networking refers to the separation of networking hardware and software. This is also sometimes also referred to as open networking because the interface to the networking hardware is published and open for any compatible software network operating system (NOS) to run on the box and create a complete networking solution. In many cases, that open interface conforms to industry standards such as the open network install environment (ONIE). Open network switching hardware is also referred to as “white-box” or “bare-metal” switching.

Figure 4: Disaggregated Networking

Disaggregated networking is highly attractive for several reasons. Innovation in both hardware and software layers is accelerated. Disaggregated hardware is built with the same commodity switching silicon driven by the hyperscale cloud providers, resulting in superior price-performance versus closed, proprietary hardware from traditional vendors. Best of breed hardware and software products can be chosen to optimize for performance and/or cost. And network operators reduce their reliance on a particular integrated networking vendor, often referred to as vendor “lock-in.”

Based on these advantages, open disaggregated networking is the fastest growing segment of data center networking with a robust vendor ecosystem. Some of the largest providers of disaggregated switching hardware for data center networks are Edgecore, Dell Technologies and Celestica

The ecosystem of NOS software providers is also strong but evolving. NOS providers can leverage and contribute to robust open-source networking software such as FRRourting (FRR) and then add innovative features and capabilities to create differentiated solutions for different customers and markets. Some independent NOS providers, such as Cumulus and Big Switch Networks, have been acquired by integrated networking companies, while other NOS providers including Pluribus Networks continue to grow independently, and newer open source NOS options, such as SONiC, are starting to mature for non-hyperscale deployments and are gaining industry interest.

How and When to Adopt Disaggregated Data Center Networking

Given the clear benefits, many data center operators would like to adopt disaggregated networking but they face some important choices about how and when to do so. Disaggregated networking must be deployed as part of an overall data center networking strategy, including overlay network virtualization, multi-site data center interconnection and unification, network automation and network visibility, and putting all of those pieces together can be daunting.

Some data center operators with large internal network engineering teams capable of do-it-yourself (DIY) integration and programming may embrace open source SONiC as their OS of choice and BGP EVPN as a standards-based overlay technology, and then create their own network automation environment (for example using custom Python scripting) to deal with the complex challenge of provisioning underlay and overlay networks and scaling network operations across multiple data centers.

For the majority of data center operators, that DIY approach is difficult at best and maybe impossible. These operators will look to software and solution providers who can provide pre-integrated solutions, built on the foundation of disaggregated networking, in order to build robust underlay and overlay networks, unify their private clouds across multiple sites and achieve their network automation and visibility goals.

As you can see, data center operators must consider how disaggregated networking fits into a complete data center network architecture that aligns to their private cloud requirements and their capabilities. To do so, they must address these other open questions and challenges:

Each of these questions is addressed in this blog series, starting with the next blog: What to Know About Data Center Overlay Networks.

If you have more questions and would like to talk to one of us over here at Pluribus don’t hesitate to contact us.