Editor's Note: This is a guest post by Neil Levine, VP Product for Inktank, sponsor of the Ceph Project.
A decade ago, as CTO of a large service provider, I was lucky to be able to drive an open-source everywhere strategy. In addition to the ubiquitous LAMP stack, we managed to use open-source software in almost every part of the business, not just in the data center but also in departments like accounts and HR. However, there were two holdouts against the power of open source: storage and networking.
The funky purple NetApp boxes and Cisco’s sleek black boxes not only ran the heart of our operations, they were the all-devouring black holes in the annual budget. It was hard to wean ourselves off them: fear of data loss or slow performance was the softly-voiced threat against a rapidly growing business. While I knew it was really just clever software on dumb hardware, the lack of alternatives sustained the fear and inertia.
But over the course of the past decade, the reality changed in both storage and networking, and we are almost at the stage where you can run your data center entirely on open source. How did this happen and what does this mean for the CIO?
With many brains, all problems become software problems
The past decade saw a number of trends, which have created this new reality.
The first is the success of open source itself. The innovation culture, especially in universities and commercial organizations, is now heavily biased towards open source, where ideas can be improved and bugs squashed quickly. This has meant that when paradigm-changing ideas prove their worth, like object storage or the OpenFlow networking protocol, there is often a ready-to-use reference implementation, with an open-source license, for companies to take and build a commercial venture around. When going to market, it makes no sense to kill the open community that helped birth the project.
At the same time, we have seen the perfection of distributed, scale-out system architectures, as championed by the kings of scale - the web 2.0 companies and cloud providers like Amazon Web Services. Explicitly designed to be vendor-neutral towards the hardware, and reflecting the reality that the bulk of the smarts was never in the hardware anyway, these vendors helped turn storage and networking into yet another software problem to solve.
While these scale companies often fiercely guard their implementations, this being their competitive advantage, they have been liberal with their dissemination of the theories and concepts. And where a software idea is shared, an open-source project almost always forms.
But it has not just been a supply-side change that has driven this change. On the demand side, companies are now both wise to the cost savings that open source delivers and in urgent need of them.
The Fierce Urgency of Tomorrow
CIOs feel a real sense dread when they look at certain parts of their infrastructure over the next 5 years, particularly on the storage side, as demands go up with budgets remaining flat.
As Jay Parikh, the VP of Engineering at Facebook said, “Our big data challenges that we face today will be your big data challenges tomorrow,” and, as Facebook worked out, storing everything on proprietary technology is impossible if costs are to be kept under control.
While traditional enterprise IT departments are conservative towards change, they now have the perfect vehicle to explore new options as they deploy private cloud technologies like OpenStack or Apache Cloudstack. It is no surprise to see so many new storage and networking vendors participating in these communities: they represent the soft underbelly of corporate IT.
These platforms offer the opportunity to experience deploying scale-out storage technologies or implement software-defined networking, with a view to moving legacy workloads to them. And now, they are almost always being done under a corporate policy of keep everything 100% open source to keep costs down and ensure vendor flexibility.
So what are the practical realities of running a 100% open-source data center?
On the storage side, the market for open-source products is now fairly mature and competitive and the reality is achievable today. Ceph, a unified platform for object, block and file storage (disclaimer: I work for Inktank, the commercial sponsor of the project) is seeing large-scale deployments at Fortune 500 companies, such as Bloomberg, AT&T and Intel for both cloud and traditional storage needs. OpenStack Swift and Basho’s RiakCS are also introducing admins to object storage, while Lustre and GlusterFS offer open-source file-systems for legacy workloads.
In a case of open source becoming mature enough that it is trying to disrupt itself, many of these same storage technologies are also positioning themselves as alternatives to HDFS, the open-source storage system underpinning the Apache Hadoop ecosystem.
On the networking side, it’s fair to say that the market is less mature but with some interesting projects.
Cumulus Networks has just announced a new Linux distribution, specifically designed to be used as a foundation for network products. Another recent announcement was the formation of the Open Daylight Project, being run under the Linux Foundation, with some heavyweight but traditionally proprietary vendors like Cisco and Juniper participating. Collaborating around shared problems, Chris Wright of Red Hat, one of the members of the project, said the project aims to “build an ecosystem which is not a bunch of vendor-specific implementations but also allows vendors to differentiate.” He also stressed how, in true open-source fashion, the project hopes to allow users to innovate on the software in ways the members of the project can’t even foresee. While it is too early for products to appear in the market, it is only a matter of time.
We are the 100 Percent
While much press coverage now focuses on how open source can create new markets, particularly in the Big Data and NoSQL spaces, the future now looks extremely positive for open-source technologies in the markets which have been most resistant to its effects.
In sum, we are almost at the point where an IT department can be running 100% open source in the data center, with all the benefits of no vendor lock-in, cost savings and rapid innovation that have characterized the disruptions that Linux and MySQL brought to the operating system and database.
Neil Levine is VP Product for Inktank, sponsor of the Ceph Project. With a background in large systems infrastructure and open-source software, Neil is responsible for Inktank’s product strategy. Neil was co-founder and VP Product at venture-backed start-up Nodeable, and before that was GM/VP at Canonical, the commercial sponsors of the Ubuntu operating system, where he ran the unit which built tools and delivered services to Ubuntu’s enterprise customers, as well as being responsible for server and cloud product strategy. As CTO of Claranet, one of the largest independent internet service and hosting providers in Europe, Neil helped build the company from a 15-employee organization to a 650-strong company operating in nine countries.