Many data center operators are interested in bringing the benefits of hyperscaler technologies to on-prem data centers. One of these technologies is SONiC, an open source network operating system that is being advanced under the auspices of the Open Compute Project (OCP). There are a number of questions that enterprises, communication service providers and tier 2 cloud service providers need to ask themselves to understand if SONiC is a good choice for their on-prem data center and private cloud networks.
What is SONiC?
SONiC, which stands for “Software for Open Networking in the Cloud,” is a network operating system originally designed by Microsoft for their data center networks. Microsoft was frustrated with the overly complex operating systems provided by vendors like Cisco, Juniper and Arista that included many features that Microsoft simply did not need for their Azure cloud network. Thus, SONiC was built by Microsoft in a completely modular way based on running networking functions in containers so components could be added or removed as a mechanism to build a lean, optimized OS that only contained the essential features to run the Microsoft Azure cloud network. They also developed the Switch Abstraction Interface (SAI) with a goal of enabling SONiC to run across switches from different vendors and even different packet processing ASICs.
Image from “SONiC: The networking switch software that powers the Microsoft Global Cloud”
How did SONiC become an open-source project?
Microsoft decided that they best way to continue to drive SONiC forward in terms of maturity, scale and automation was to contribute the code to an open source project that would bring a community of developers together to work on and enhance the code. Thus in 2017 SONiC became a subproject under the Open Compute Project networking project group. Some hypothesize that since this is not a revenue generating software initiative for Microsoft that this was the most cost efficient way for Microsoft to secure enough engineering talent to continue to enhance SONiC for their future needs.
Should I deploy SONiC Community Edition in my network?
Any non-hyperscaler enterprise or service provider can deploy SONiC’s “free” community edition that can be downloaded from Github. The promise of a free OS is certainly one of the most appealing factors to many networking teams. That said, the amount of human resource and lab infrastructure investment required to stand up the community version of SONiC so it runs reliably in a production grade network is massive. One of the few (the only?) enterprises that I am aware of that have successfully deployed the community edition of SONiC is the large retailer Target. This 30 minute presentation from the OCP 2021 Fall Summit describes how Target customized SONiC for their use and built out a full SONiC OS test lab and QA team, investing significant engineering and QA resources to deploy SONiC in their production environment. And even then, towards the end of their project, they hired an outside vendor to help them with future customizations and to provide ongoing support.
In the video one of the lead engineers at Target explains that “you need to have an Explorers mindset” and be willing to be a pioneer and accept failures along the way as you are effectively doing work on your own network operating system. Not all enterprises are in a position to be explorers like this and those who are, like Target, may still end up needing outside support. The question you must ask yourself is whether your networking team has the resources available and the appetite to invest in customizing and testing your own distribution of SONiC.
What about deploying vendor distributions of SONiC in my network?
As another option, vendors such as Dell, Edgecore, Mellanox and several others are working on their own distributions of SONiC that include enterprise and service provider customizations, documentation and support. If you pursue this path, the good news is you have a vendor partner who can handle the essential QA testing to work towards a stable distribution of SONiC that will be reliable in production and will also provide 24 x 7 support. Of course, the “free” aspect of SONiC is completely gone if you choose this option. Based on discussions with a number of customers who considered SONiC, these enterprise distributions can be quite expensive. In fact, if you want features such as telemetry, many vendors will require you to upgrade from a standard version to an even more expensive premium version.
Furthermore, as an opensource project, SONiC is still maturing and since it was originally designed for hyperscalers’ somewhat narrow feature requirements, it often does not have key features that might be needed in your environment. For example, a lot of work still needs to be done to enhance layer 2 network and BGP EVPN VXLAN overlay fabric functionality in SONiC. Also, there is no integrated automation with SONiC so it is still up to the network operations team to build their own automation scripts using languages like Python, possibly leveraging Ansible or alternatively purchasing an external automation system which can add significant additional costs. You can read more about these various automation choices in this blog Understanding Various Approaches to Data Center Network Automation.
In a nutshell, the perpetual or subscription license cost of a vendor distribution of SONiC once all of the options have been selected might surprise you. In addition, the operational costs and time consumed from your team even for a vendor distribution can dwarf capex costs. Even with vendor distributions you will need a bit of an explorers mindset.
Pluribus Netvisor and the Adaptive Cloud Fabric – an Alternative to SONiC
Pluribus networks provides a very powerful data center networking solution called the Netvisor® ONE operating system (OS) software. Like SONiC, Netvisor ONE is a Linux-based OS based on opensource software such as the Linux Foundation FRRouting (FRR) software. Netvisor runs on a wide variety of open networking switches from Dell, Edgecore and even on Pluribus’ own OCP-accepted Freedom Series switches. Netvisor is a carrier grade OS that is fully automated, instrumented, tested and hardened in large scale enterprise and service provider networks around the globe. (Fact: Netvisor ONE is deployed in the 4G/5G virtualized mobile cores of over 100 mobile network operators.)
Built into Netvisor ONE is the Adaptive Cloud Fabric™ (ACF) software, an integrated software defined networking (SDN) control plane solution which provides full automation of the underlay and overlay fabric and also provides, at no extra cost, comprehensive per-flow telemetry. ACF is an intent-based, declarative cloud networking solution that is highly automated – and since it is integrated into the OS, every feature is automated and fully tested. In a typical enterprise or service provider use case, on Day 1 the underlay and full mesh overlay fabrics are automatically deployed by ACF using CLI or our UNUM Fabric Manager graphical user interface. Once the fabric is deployed, rolling out new Day 2 network services takes only seconds and is very similar to what IT teams experience when provisioning a networking service in the public cloud. L2/L3 network services or security policies are easily and instantly deployed as objects – the user thinks in terms of services and objects and declares intent to deploy a network service, fabric wide. For example, deploying a VLAN can be achieved with a single command “vlan-create id 110 scope fabric”. With this one command the NetOps manager declares their intent to deploy VLAN 110 across all switches in the fabric and the SDN control plane atomically deploys the new VLAN object to every switch and ensures, through a transactional database mechanism, that the configuration is consistently implemented across all switches in the fabric. The net result is that instead of spending time SSHing into every switch to deploy a VLAN or building Python scripts, the NetOps team focuses on deploying services fabric-wide in seconds, with 100% consistency, and delighting their customers.
Summary – What is the right choice for you?
At the end of the day, it all comes down to your data center network objectives, time and money. Do you want to prioritize your time on deploying services or customizing your own OS? If your team is large enough, needs full control of the network OS and has the time and ability to tinker and tweak in the bowels of the OS along with building automation scripts, then SONiC could be a good fit. On the other hand, if your network or IT team is already as busy as most teams are, and your goal is to find the most efficient way to achieve your business outcomes and delight your end users, then an open source networking solution that is pre-integrated, fully automated and QA’d such as the Pluribus Netvisor ONE OS is probably the right choice for you.