September 2, 2015

Project Calico: Open Source, High-Scale Network Fabric For The Cloud

Metaswitch-logoCloud developers and operators are facing a challenge: Much of the IT toolkit that has worked well for "silo" architectures and well enough for virtual machine environments isn't a good match for apps made using containers or for microservices, where components may be not just on different machines but in many locations, and instances may come, go, or multiply. Yesterday’s "network fabric" does not accommodate this activity efficiently or reliably.

Project Calico, an open-source initiative started by Metaswitch Networks, is crafted to provide a new "cloud-friendly network fabric" (my term, not theirs, although they're welcome to use it).

According to Metaswitch CTO Martin Taylor, in his July 9, 2015, blog post, Calico and containers are flip sides of the same coin: "When we launched Project Calico about a year ago, our main focus was on providing a better way to network virtual machines in OpenStack environments.  But over the last 6 months or so, the Calico team has been putting a great deal of effort into the Linux container space, and Calico is now emerging as one of the front runners in container networking."

For this article, I’ve been asked to help introduce readers to Project Calico. A full explanation would take up more than the available space – and Metaswitch already provides several detailed write-ups, including the original July 8, 2014, announcement press release, plus the Project Calico website, particularly “Why Calico?

So instead, here are some highlights from a phone conversation I had with Christopher Liljenstolpe, Director, Solutions Architecture, at Metaswitch, on Project Calico, why Metaswitch started it, and other interesting and useful things to know.

Daniel Dern: Why did Metaswitch launch Project Calico?
Christopher Liljenstolpe: While Metaswitch is mostly known for voice and VoIP infrastructure for carriers, for the past 30 years we've also been writing networking specifications that we license to OEMs – the odds are, something in your network is written by us. So we have the experience on how to work with networking software.

When we started looking at helping our carrier customers move to a virtualized environment, looking at the networking then in OpenStack and now in cloud environments, we said, “That's unnecessarily complex.”

We began working on a solution – Calico – and to maximize adoption, made it open-source, under the Apache 2.0 License.

DD: What's the relationship between Metaswitch Networks and Project Calico?
CL: It's currently sponsored by Metaswitch – most of the work is being done by our employees. But we won't do free crippleware versus a working paid version. The same code base will be available for free and is what Metaswitch will offer support contracts and consulting services for.

DD: Are there other solutions for these needs? If so, why do we need yet another one?
CL: You can use an OpenStack Neutron plugin with things like VMware's NSX, Nuage, etc. But that means you still have to use VLANs, L2 segments, and complex stateful things, and manage NAT and L3 gateways. Calico plugs into the same Neutron framework. From an API point of view, we appear the same. If, on the other hand, you use Docker, Kubernetes, Mesos, etc., you can use an "overlay" model, with tunneled infrastructures, as in OpenStack – or use a NAT mapping model, using servers' physical addresses and mapping the port numbers by using service discovery.

But both of these models do things that interfere with application function and/or hurt performance, reliability, and scaling by adding (de)-encapsulation steps, NAT mappings, and hairpin routing, for example.

I've talked with Web-scale content networks, financial services, and other companies that have internally developed, proprietary things pretty close to Calico's architecture. Most have integrated their home-grown solution into their IT, and can't or won't change from theirs to Project Calico – although we are talking with some of these companies. We see these existing internal solutions as a validation of our approach, and we can make Calico available to a broader audience.Metaswitch-traditionalDD: What led you to develop Calico's networking architecture?

CL: We went back and looked at the real requirements of scale-out [cloud] environments these days. These environments are containers or virtual machines that talk to each other. They use IP. And you need to be able to apply policy to enforce the characteristics of the communication between endpoints – who can talk to whom, at what speed, etc. I want to be able to say “These endpoints need to talk to endpoints,” with a policy wraparound and provide this in a way that will scale.

Rather than invent or re-invent a solution, we decided to try building an Internet-backbone-ish architecture for the data center and for the scale-out cloud. We use the same tools – BGP [Border Gateway Protocol], routing at the edge instead of at the core. We turn each compute server in the cloud into a router, routing traffic to and from the containers, VMs, and whatever else comes along.

We named this activity Project Calico. And we found out that it does work.


DD: Does running Calico on each device add load to the compute servers?

CL: If I'm already using that computer to do my network forwarding, in a virtualized environment, it's already doing routing/bridging – it's already a network device. That's probably less work than the overlay work and less load than the load of an overlay. Any compute server today can hold more route data than any Cisco or Juniper switch.

DD: What are some important things to know about Calico?
CL: One, because we are using standard IP techniques, the things that people take as difficult in the data center come for free – like IPv6. Calico has supported IPv6 and included a fully functional IPv6 solution for OpenStack from Day 1. The Linux kernel knows how to do both IPv6 and IPv4. When we do run into issues like overlapping addresses, there are solutions for this, using IPv6 and IPv4... We are building on existing tools.

Two, we’ve designed Calico to support the general use case of the cloud environment (the 99.9+% case) in a very simple, scalable way – and one that degrades gracefully in failure scenarios, rather than building fragility and scale limits into the solution to support the very small corner cases that still might exist (such as non-IP workloads). Those can be handled via a special-case mechanism, just as they are on the general Internet.

DD: How does Calico interact with the network I've got?
CL: Calico talks to the outside world with existing protocols. It can talk to existing switches and routers. It's the same type of IP packets – unlike the overlay environments, where if I have to interact with legacy or hardware-based infrastructure, I have to put "on/off" ramps, de/re-encapsulate the tunnel, in front of everything that isn't part of the overlay. In the Calico model, each VM or container is just an endpoint. If you have a NAS farm, it won't have Calico on the NAS head units – and that's OK, those are IP addresses, they can talk to other IP addresses. You can turn Calico on in one node of your network and leave everything else exactly the way it is.

DD: Who's currently using/testing Calico?
CL: We are in trials with a number of large financial services firms, SaaS, and hosting providers. We also have published integrations and partnerships with projects and companies like Kubernetes, CloudSoft, Mirantis, Piston Cloud (now Cisco), Canonical, and others.

DD: Where can people get Calico?
CL: Calico is available through GitHub, or you can go to “Get started on OpenStack VMs” and “Get started on Docker containers.” And here's information on getting involved and on using Calico.

DD: Thank you – this has been fun!

Click Here!