The goal of this article is to review the history and architecture of Linux as well as its present day developments to understand how Linux has become today's leading platform for cloud computing. We will start with a little history on Unix system development and then move to the Linux system itself.
The story of Linux as a platform for cloud computing starts in 1969 with the creation of the Unix2 Operating System at AT&T Bell Laboratories. Unix was first developed on mini-computers, which had very small memory address spaces by today's standards. The PDP-11 (one of the main systems used for the early development of Unix) had an address space of 64 thousand bytes of memory for instructions, and (on some models) 64 thousand extra bytes for data. Therefore the kernel of the operating system had to be very small and lean.
Moving from its original architecture of the PDP-7, onto the PDP-11 (and later onto other architectures), the kernel also divided into architectural independent and architectural dependent parts, with most of the kernel migrating from machine language into the “C” language. The advantage of this architectural move was two-fold: to isolate the parts of the kernel that might be affected by vulgarities in the hardware architecture and to remove as much as possible the tediousness of writing in non-portable machine-language code, which typically led to a more stable operating system.
The kernel of Unix provided only a few “services” for the entire system. The kernel scheduled tasks, managed real memory, handled I/O and other very basic functions. The major functionality of the system was created by libraries and utility programs that ran in their own address spaces. Errors in these non-kernel libraries and utilities did not necessarily cause the entire system to fail, making the system significantly more robust than operating systems that did a great deal of functions in the kernel.
As a time-sharing system it had to have a certain amount of security designed into it to keep one user's data and programs separate from another, and separate from the kernel. The kernel was written to run in a protected space. A certain amount of robustness was also necessary, since a “fragile” operating system would not be able to keep running with dozens or hundreds of users and thousands of processes running at the same time.
Early in the life of Unix, client/server computing was facilitated by concepts like pipes and filters in the command line, and client programs that would talk with server programs called “daemons” to do tasks. Three of the more famous daemons were the printer subsystem, the “cron” (which executes various programs automatically at times specified) and the e-mail subsystem. All of these had “client” programs that would interact with the human on the command line. The client program would “schedule” some work to be done by the server and immediately return to the on-line user. The server programs had to be able to accept, queue and handle requests from many users “simultaneously.” This style of programming was encouraged on Unix systems.
With Unix it was easy and common to have multiple processes operating in the “background” while the user was executing programs interactively in the “foreground.” All the user had to do was put an ampersand on the end of their command line, and that command line was executed in the “background.” There was even an early store-and-forward email system called uucp (which stood for “Unix-to-Unix Copy”) that would use a daemon to dial up another system and transfer your data and email over time.
As Unix systems moved to larger and faster hardware, the divisions of the software remained roughly the same, with additional functionality added as often as possible outside the kernel via libraries, and as seldom as possible inside the kernel. Unix systems had a relatively light-weight process creation due to the command executor's (the “shell”) pipe-and-filter based syntax, so through time, the kernel developers experimented with ever lighter-weight start-up processes and thread execution until Unix systems might be running hundreds of users with thousands of processes and tens of thousands of threads. Any poorly designed operating system would not last long in such an environment.
Unix systems were moving onto the networks of the time, Ethernet and the beginnings of the Arpanet. Design was going into remotely accessing systems through commands like rlogin and telnet, later to evolve to commands like ftp and ssh.
Then Project Athena of MIT offered the Unix world both a network-based authentication system (Kerberos) and eventually the X Window System, a client/server based, architecture neutral windowing system, both continuing the network service-based paradigm. In the last years of the 1990s, many Unix vendors started focusing on server systems, building systems scaling dramatically through Symmetrical Multi-Processing (SMP), high availability through system fail-over, process migration and large, journaled filesystems.
At the start of the twenty-first century Unix systems had become a stable, flexible set of operating systems used for web servers, database servers, email servers and other “service-based” applications. The problem remained that closed source commercial Unix systems were typically expensive, both for vendors to produce and for customers to buy. Vendors would spend large amounts of money duplicating each other's work in ways that the customers did not value.
Large amounts of effort were made in gratuitous changes to the many utility programs that came with Unix. Delivered to customers from various vendors, the commands worked a slightly different way. What customers of the day wanted was exactly the same Unix system across all their hardware platforms.
This general background in mind, we look at the modern-day Linux system and see what Linux offers “cloud computing” above and beyond what Unix offered.
In 1991, the Linux kernel project was started. Leveraging on all of the architectural features of Unix, the levels of Free Software from GNU and other projects, the Linux kernel allowed distributions of Free Software to take advantage of:
• flexibility in the Unix architecture combined to tailor specific packages to the needs of the user
• lower cost of collaborative development, combined with flexible licensing for service-based support
• same code base across a wide variety of standards-based hardware
Linux continued the overall design philosophies of Unix systems, but added:
• functionality outside the kernel if efficiently possible
• network and API based functionality
• programming to standards
while the “openness” of its development and distribution allows for development and deployment of features and bug fixes outside the main development stream.
Years ago there was a request for journaling file systems in Linux, and several groups offered their code. The mainstream developers felt that the “time was not right,” but the openness of the development model allowed various groups to integrate these filesystems outside of the mainstream, giving customers that valued the functionality the chance to do testing and give feedback on the functionality of the filesystems themselves. In a later release many of these filesystems went “mainstream.”
While not everyone suffers the same extent from the effects of any particular bug, some bugs (and especially security patches) cause great disruptions. FOSS software gives the manager the ability to more quickly apply a bug fix or security patch that is affecting their systems. Linux gives back control to the manager of the system, instead of control remaining in the hands of the manager of the software release.
With potentially millions of servers (or virtual servers) you may get great efficiencies from having distributions tailored to your hardware than the individual software manufacturer would provide. When you have a million servers, a one-percent performance improvement might save you ten thousand servers. There is little wonder why companies like Google and Yahoo use Linux as their base of cloud computing.
In the mid 1990s a concept appeared called “Beowulf Supercomputers,” which later became what today people call “High Performance Computing” (HPC). Most of the worlds fastest supercomputers use Linux so concepts such as checkpoint/restart and process migration started to appear. Management systems evolved that could easily configure, start and control the thousands of machines that were inherent in these HPC systems.
The same basic kernel and libraries used on these supercomputers could also be run on the application developer's desktop or notebook, allowing application programmers to develop and do initial testing of super-computing applications on their own desktop and notebook systems before sending them to the supercomputing cluster.
In the late 1990s and early 2000s, virtualization started to occur with products like VMware, and projects like User Mode Linux (UML), Xen, KVM and VirtualBox were developed. The Linux community led the way, and today virtualization in Linux is an accepted fact.
There are also several security models available. Besides the Kerberos system, there is also Security Enhanced Linux (SELinux) and AppArmor. The manager of the cloud system has the choice of which security system they want to use.
It is also easy to “rightsize” the Linux-based system. The more code that is delivered to a system, the more space it takes up, typically the less secure it is (due to exploits) and the less stable it is (with code that is less-used still being available to create execution faults). FOSS allows a system manager (or even the end user) to tailor the kernel, device drivers, libraries and system utilities to just the components necessary to run their applications, not only on the server side of “The Cloud,” but on the thin client side of “The Cloud,” allowing the creation of a thin client that is just a browser and the essential elements necessary to run that browser, reducing the potential of exploits on the client, all without a “per seat” license to worry about.
If a closed source vendor decides to stop supporting functionality or goes out of business, the cloud system provider has no real recourse other than migration. With FOSS the business choice can be made of continuing that service using the source code from the original provider and integrating that code themselves, or perhaps enticing the FOSS community to develop that functionality. This provides an extra level of assurance to the end users against functionality suddenly disappearing.
Linux provides an opportunity for a cloud service provider to have direct input to the development of the operating system. Lots of closed source software providers listen to their customers, but few allow customers to see or join the development (or retirement) process. The open development model allows many people to contribute. Linux supports a wide range of networking protocols, filesystems, and native languages (on a system or user basis). Linux supports RAID, both software RAID and various hardware RAID controllers.
Linux has a very permissive licensing policy with respect to numbers of machines, of processors per machine and users per machine. The licensing cost in each case is “zero.” While vendors of Linux may charge for support services based on various considerations, the software itself is unrestricted. This makes running a data-center easier than accounting for software licenses on a very difficult licensing schedule as required by some closed source companies.
Finally, size does matter, and while Linux kernels and distributions can be tailored to very small sizes and features sets, Linux was able to support 64-bit virtual address spaces in 1995. For over fifteen years Linux libraries, filesystems and applications have been able to take advantage of very large address spaces. Other popular operating systems have had this feature for just a short time, so their libraries and applications may be immature.
The area that allows “The Cloud” to work is networking. Linux supports a wide range of network protocols. Linux supports not only TCP/IP, but X.25, Appletalk, SMB, ATM, TokenRing and a variety of other protocols, often as both a client and a server. Early uses of Linux were to act as a file and print server system and email gateway for Apple, Windows, Linux and other Unix-based clients.
Network security features such as VPNs and firewalls delivered in the base distributions combined with the robustness and low cost of the operating system and the low cost of commodity-based hardware to make Linux the operating system of choice for ISPs and Web-server farms in the early 2000s.
More Than Just the Base Operating System
“Cloud Computing” is more than just the kernel and the base operating system. Standard tools are needed on the platform to allow you to develop and deploy applications. Languages associated with “The Cloud” (PHP, Perl, Python, Ruby) started out as FOSS projects and for the most part continue to be developed on FOSS systems. Many of the new web applications and frameworks get developed on Linux first, and then ported to other Unix (and even Windows) systems.
Even with all these features, Linux would not be as useful for Clouds without some of the cloud framework models that are evolving.
Cloud frameworks typically help in-house systems teams set up and manage “private clouds.” Set up to be compatible with public clouds, instances of virtual environments may be transferred back and forth to allow for local development and remote deployment. Companies may also run the applications in-house under “normal” conditions, but utilize “public cloud” resources under times of heavy load.
Cloud frameworks typically support many styles of virtualized environments with several common distributions. While it is beyond the scope of this article to go into each and every framework, these are two of the main frameworks of today:
Eucalyptus is a FOSS Cloud architecture that allows private clouds to be implemented in-house and supports the same APIs as “public” cloud-based environments such as Amazon's Web Services. It supports several types of Virtualization, such as Xen, KVM, VMware and others. Eucalyptus is compatible and packaged with multiple distributions of Linux, including Ubuntu, RHEL, OpenSuse, Debian, Fedora and CentOS.
OpenQRM is another architecture that allows you to create an in-house “cloud” that supports EC2 standards of APIs. It also supports virtualization techniques such as KVM and Xen to allow you to manage physical and virtual machines and deployments. Virtualized images of Ubuntu, Debian and CentOS are supplied for rapid deployment.
Linux Distributions: Heading into the Clouds
At the risk of missing one of the commercial distributions, this article will mention Ubuntu's Cloud program based on Eucalyptus, Red Hat's Enterprise MRG Grid in conjunction with Amazon's EC2 program, and (while not exactly the same as the first two) SuSE Studio for creating virtualized environments to run under Xen.
It is hoped that this article shows how the architecture of Linux, somewhat guided from its Unix past but enhanced by present day techniques and developments, creates a standard, robust, scalable, tailorable, portable, cost-effective environment for cloud computing, an environment that the cloud supplier and even the end-user can not only “enjoy” but participate in and control.
I would like to acknowledge the input of some of the members of the Greater New Hampshire Linux User's Group: Bill McGonigle, Ken D'Ambrosio, Tom Buskey, and Brian St. Pierre in adding to my article about why Linux systems make good cloud computing platforms.