So the theme of the day is Network Virtualization, software defined networks (SDN) and taking virtualization to its logical conclusion i.e. server, storage and network in a giant resource pool that can be allocated/assigned any which way. But easier said than done. Server and Storage virtualization were a bit simpler since we were dealing with only a single OS that needed to provide the right abstraction layer. The H/W resource pool (disk, cpu, network, memory, etc) was managed by the single OS, as such provisioning it between various virtual machines or storage pool was simpler. The network, by definition is useful only when multiple devices are connected but trying to treat them as a single resource pool is much harder. A virtual network has to deal with not just links, bandwidth, latency and queues but also higher level functionality like routing, load balancing, firewalling, DNS, DHCP, VPN, and numerous other services. And we haven’t yet talked about how all this will have to hook up together along with virtual machines and virtual storage pool in an implementation-easy manner. Now before you argue that every component is already virtualized (which is true and so why do we really need network virtualization), one could also argue that it still doesn’t give us a complete virtual network. Think of it as someone waiting for dinner but is instead served raw potatoes, onions, tomatoes, eggs, frozen meat (and the condiments) and shown the stove to make his own main course!
The real reason why network virtualization is a tough problem to solve is due to two essential requirements of switching: very high performance and ultra-low latency. These force all the switching functionality to be inside a very highly complicated ASIC which does all the hard work in shuffling 1.2 Terabits per seconds of data and sub micro second latencies and hence doesn’t need much software on top. The embedded OS controlling the switch is mostly used for just configuring the switch chip using a CLI (command line interface) that allows the administrator to control and configure each component on the switch but almost nothing else. So when we started playing with some of the prototype next generation boxes that our friends at Fulcrum and Broadcom gave us, we just kept asking ourselves whether we could have a real OS running the chip to be empowered to do something more useful to achieve the elusive goal of complete network vritualization. We even asked our friends to see if there was someway for them to put a full fledged OS on top of these chips (being the OS person I have been for most of my life ).
And that was when I realized that to solve the network virtualization problem, we really need an OS that understand resource pools and virtualization on the chip. But a single switch by itself is not very interesting so we need an OS that controls all the switches. Hmmmm – one OS that controls them all (borrowing from LOTR which reminds me to ask Peter Jackson whatever happened to the prequel!!). So before we can even start building anything more complicated, we need to build a network hypervisor that has semantics similar to a tightly coupled cluster but controls a collection of switches and scales from one instance to hundred plus instances.
Having pioneered virtual switching and resource control in the server OS (Solaris to be specific – the project, Crossbow that I started in 2003 got integrated in OpenSolaris in 2008), I eventually set out to do the same for larger networks in the form of Pluribus Networks Inc and apply the hard lessons learned from enterprise customers. This is what we at Pluribus call, “Network 2.0 or Network Virtualization without Limits”.
The Network OS is finally taking life and is able to treat the network exactly as a one giant resource pool. A note of caution though- please don’t confuse the Network OS with typical management layer that manages a collection of devices. We do still need a management layer to configure and manage the OS but the policy enforcement, congestion control and resource management across all devices is done by the OS. It is the same as a server cluster that doesn’t get rid of the management layer but actually gives the management layer something that is more manageable.
The post was originally posted by Sunay on his personal blog.