Pluribus Networks
  • Home
  • Solutions
  • Partners
  • News
  • About Us
    • Team
    • Investors
    • Careers
    • Location
    • Contact Us
    • Blog
Traditional Network Infrastructure Model and Problems With It

Post navigation

← Previous Next →

Traditional Network Infrastructure Model and Problems Associated with it

Reply
Posted on July 18, 2012 by peter

Traditionally, datacenter network infrastructure for large companies or large compute farms was built based on a three-layer hierarchical model, which Cisco calls the “hierarchical inter-networking model”. It consists of core layer switches ($$$) which connect to distribution layer switches ($$) (sometimes called aggregation switches), which in turn connect to access layer switches ($). Access layer switches are frequently located at the top of a rack, so, these are also known as top-of-rack (ToR) switches. Most network infrastructure is still laid out this way today.

The Hierarchical Model in Network Switching Architecture
The Hierarchical Model in a Datacenter Network Switching Architecture

The good news with this hierarchical model is that traffic between two nodes in the same rack, if at Layer 2 of the network stack, is sent with low latency. If the access switches are 10Gb, then the communication can have high throughput as well. Also, this type of configuration allows for a vast number of ports at the access layer.

The bad news is, well, everything else about this model. It’s expensive. East-west communication, say between racks of gear, means that traffic travels to the aggregation layer and frequently to the data center core. These multiple hops, frequently across over-subscribed backplanes, take a very long time – 50uSec or more via traditional vendors and their traditional solutions. Any Layer 3 traffic needs to leave the rack and reach the aggregation tier of switches before being routed, even back to the same rack it came from.

And east-west traffic isn’t the exception in modern application deployments, it’s the norm. This traffic is from applications talking to each other, talking to databases, and talking to IP-attached storage.

The problems with this typical hierarchical networking multiply when virtual machines run on the servers. When the servers in the racks are running virtual machine managers and virtual machines, limits abound. East-west traffic is even more prevalent, because virtualization essentially randomizes the locations of the (virtual) servers. With traditional architecture, the datacenter manager could load a rack with components that were likely to communicate with each other (say application servers and database servers). With virtualization those components could be anywhere within the virtualized infrastructure. Virtualization also pushes the limits of IP addressing. For example, the maximum number of VLANs is 4096 (a limit based on the IEEE 802.1Q standard), which can drive artificial limits within a virtualized facility. While a facility might naturally need thousands of VLANs for multi-tenancy, because of the VLAN limit the facility may need to be divided into multiple small virtualization clusters. This limits resource management options, for example preventing a VM from being able to be moved to the least loaded server if that server is in some other cluster.

VM migrations between VLANs can happen based on network infrastructure and protocols, for example using generic routing encapsulation (GRE) to tunnel Layer 2 packets through Layer 3 infrastructure. VMware has its own solution that works with specific components (vDS) providing MAC-in-MAC encapsulation, removing VLANs in favor of Port Group Isolation. These solutions are problematic because of proprietary vendor lock-in, extra overhead, or extra complexity.

Other bad news is delivered if the datacenter managers want to make any changes to their existing architecture. Once this hierarchical infrastructure is put in place, change is difficult. Another rack of gear not only means another ToR switch, but possibly another aggregation switch or even more ports in the core switch. If an application running in a rack needs more throughput, how is it delivered? Trunking multiple ethernet connections into a single host helps, but what if the throughput is needed to applications running in other racks? With Spanning Tree Protocol (STP), there are serious limits to how many connections can be added between the switches, leading to bottlenecks above and beyond the existing high latency.

And let’s hope that no errors, odd behaviors, connection drops, or performance issues occur in this traditional architecture, because visibility into traffic is limited and debugging is a challenge. Quality of service, traffic prioritization, packet or traffic capture (for regulations or debugging are all challenges (or down-right impossible). In fact many network admins need to worry not only about mean-time-to-recovery (MTTR) from a problem, but also MTTI – Mean-time-to-innocence.

When there is a problem in the infrastructure frequently networking is the first area blamed, because it’s difficult to prove that the problem is not in the network.

So far we are ignoring desirable networking features such as fire-walling and load balancing. Fire-walling is typically provided outside of this traditional network infrastructure, or via ad hoc methods. VMware again can firewall within is hypervisors, but what happens if those virtual servers need to talk to a physical database server or NAS storage? In almost all cases that traffic is not fire-walled, due to the fire-walling cost and performance impact. Even datacenters that are willing to make those compromises are challenged. If general-purpose firewall features are desired, then all pertinent traffic needs to get to a firewalling device (perhaps a line card in a core router), taking multiple hops and adding latency just to get to the point of being filtered. Specific-purpose firewalls could be added, for example between the database server and all other servers, but the need to firewall other traffic means adding more and more physical firewalls (with resulting cost, complexity, and frailty). Managing such infrastructure, especially if different firewalling methods are used at different tiers, again increases complexity.

Load-balancing may seem like a point solution to specific needs, such as sending traffic to multiple web servers. It’s getting more important however as load-balancing is being required in areas a diverse as email services (Exchange 2010 and beyond) and NAS storage (NetApp Cluster-mode NAS appliances). Load-balancing will like be more pervasive over time, resulting in challenges like those of firewalling to network infrastructure managers.

A more modern design “flattens” this hierarchical network to increase performance for east-west traffic. Those networks remove the aggregation layer, requiring more ports in the core layer (with variations depending on the networking vendor). While that does provide step-wise improvement over the previous designs, it still suffers in the areas of flexibility, performance, manageability, functionality and cost.

So far we’ve painted a bleak picture for network designers and administrators. In future posts we’ll explore solutions to the data center networking problems.

On a side note, We’re also happy to announce that we now have a twitter feed and will be actively tweeting @pluribusnet

Share This :

  • LinkedIn
  • Twitter
This entry was posted in Analysis, Networking Architecture and tagged datacenter network, east-west traffic, load balancing, MTTI, network infrastructure, ToR switches, virtualization, VLANs by peter. Bookmark the permalink.

About peter

Chief Solutions Architect
View all posts by peter →

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

*

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Feed for Thought!

RSS Feed

Follow Us on Twitter

  • Pluribus will be talking about our integration with OpenStack at 9:50 on Thurs in room A104 #openstack 1 month ago
  • At the Openstack summit online.wsj.com/article/PR-CO-… #openstack 1 month ago
  • Happy to be participating at the OpenStack Summit: #openstack online.wsj.com/article/PR-CO-… 1 month ago
  • ASPone LTD is implementing their trading and financial applications on TIBCO FTL switches (based on the Pluribus N… aspone.co.uk/TIBCO 1 month ago
  • @leonardmcdowell Thanks for the follow. Interested in knowing more about Pluribus? 1 month ago

Categories

  • Analysis
  • Network Economics
  • Network OS
  • Network Virtualization
  • Networking Architecture
  • Open Networking
  • SDN
  • SDN & Openflow
  • TIBCO

Log In

Please log into the site.

Recent Posts

  • Pluribus Recognized as a “Top 10 SDN Start-Up to Watch” by Network World
  • A Couple of Presentations Tomorrow
  • We’ve Been Busy
  • Of Controllers and Why Nicira had to do a deal (Part III: SDN And Openflow – Enabling Network Virtualization in the Cloud)
  • Traditional Network Infrastructure Model and Problems Associated with it

Recent Comments

  • Amado on Pluribus Recognized as a “Top 10 SDN Start-Up to Watch” by Network World
  • Click This Link on Pluribus Recognized as a “Top 10 SDN Start-Up to Watch” by Network World
  • linux on SDN and Openflow- Enabling Network Virtualization in the Cloud: Part II
  • Yuqing on Of Controllers and Why Nicira had to do a deal (Part III: SDN And Openflow – Enabling Network Virtualization in the Cloud)
  • X on Of Controllers and Why Nicira had to do a deal (Part III: SDN And Openflow – Enabling Network Virtualization in the Cloud)

Archives

  • March 2013
  • February 2013
  • December 2012
  • August 2012
  • July 2012
  • April 2012
  • March 2012
  • December 2011
  • November 2011
Proudly powered by WordPress