April 20, 2010

A Guide to Cloud Computing on Linux

This may not be the year of the Linux desktop, but it's definitely the year of Linux powering cloud computing. Even though cloud computing is gaining popularity; it's still not well-understood. Want a bit more on the basics of cloud computing? Read on!

Behind the smokescreen of hype, there's actually something to cloud computing. You're already a consumer of cloud computing in the same way that we're all Linux users. Using Amazon or Gmail? You're using cloud computing. But that's not the same as working directly with cloud solutions.

Just like virtualization, the right cloud solution — if any — depends entirely on the workloads you have and your requirements for data handling. Most businesses of any size are probably using at least some cloud computing in the form of Software-as-a-Service (SaaS), if nothing else.

What is Cloud Computing?

Cloud computing, at least as initially defined, comprises on-demand computing delivered over the Internet. This includes several types of computing. First, Platform-as-a-Service offerings (PaaS) that allow users to run applications on cloud infrastructure. This includes services like Google's App Engine. The infrastructure is completely controlled by the service provider and the customer doesn't need to worry about the management of the systems or infrastructure that the service is running on. In fact, the user may not know whether the underlying platform is running on Linux, Windows, FreeBSD, some mixture of all the above, or something else entirely. All they need to know is the interface and how to run jobs on the system.

Next is Infrastructure-as-a-Service (IaaS), such as Amazon's Elastic Compute Cloud (EC2) that delivers on-demand and scalable services over the Internet where organizations can deploy workloads that can grow and shrink on demand to meet the need. This allows organizations to run operating systems or other infrastructure on top of a computing service. Again, customers don't manage the underlying hardware or platforms that their infrastructure is running on top of — they simply define the level of services that they need and run their infrastructure on top of that.

Finally, SaaS, which has been around for a while now. Instead of installing and running software on your own infrastructure, it runs on someone else's infrastructure in a pay per use model. Some SaaS, like Google Docs, might be entirely free to the end user. Other services charge per user, by service tier, or a combination of the two. Typically SaaS offerings are Web-based and run through the browser, like the 37 Signals suite.

You'll find a lot of packages that are designed to be offered by third-parties as SaaS as well. For instance, the Parallels Automation and Plesk Control Panel offerings from Parallels, or Open-Xchange groupware that can be customized and deployed by hosting providers.

It should go without saying, though I'll say it anyway, that Linux powers the bulk of cloud computing solutions. You'll find some Windows-based offerings, but Amazon, Google, and other major players are running their cloud infrastructure on top of Linux.

Grid, Cloud, What's the Difference?

Before there was cloud computing, there was grid computing and Sun telling us that "the network is the computer." Is there any difference between grid and cloud computing, or is it just technical hairsplitting? Though the two are similar, it's easy to make the distinction between grid computing and cloud computing.

The best description I've seen so far came from RightScale's blog, attributed to Rich Wolski of the Eucalyptus Project. Wolski describes grid computing as suitable for environments where users make fewer requests, but for larger allocations of computing power. So a project may only have a few jobs to run, but they're large jobs and tend to consume a fair amount of computing power.

Conversely, cloud computing consists of a lot of smaller requests. Think of applications running on App Engine, or users hitting a SaaS offering. The requests are minimal, but the actual number of requests are much larger. The data sets are typically smaller, but the number of requests over time is much greater.

Clouds and Appliances

If you want to talk about mixed metaphors, think about running a software appliance in the cloud. Though it's a jarring clash of metaphors, the actual practice of deploying software appliances in the cloud is smooth as silk. (To use yet another metaphor!)

Some vendors are packaging their software as virtual appliances that can be run on top of your internal infrastructure or using cloud computing services. For instance, the BitNami folks have been packaging popular open source stacks and applications to run on top of VMware, Amazon's EC2, and MyGSI GoGrid. (You can also run the stacks on top of regular servers as well, if you're still doing old-fashioned computing...)

Appliances on top of cloud platforms simplify deploying and managing applications. Rather than having to provision your own hardware and deal with software dependencies, you can simply fire up a virtual appliance and start using it.

Benefits and Tradeoffs of Cloud Computing

The key elements here are that cloud computing takes some of the complexity out of acquiring and managing computing resources. Whether that means just being able to fire up a browser and run an application, running an appliance in the cloud, or having a platform to run applications on without any concern about the underlying infrastructure.

If your shop is running a SaaS application, it removes many of the headaches around maintaining the software. No need to worry about upgrades; they're delivered seamlessly. No need to worry about maintaining licenses for the software; that should be handled automatically. There's also the advantage, for those SaaS applications that are deployed as Web apps, of allowing users to be much more mobile. I can get to Highrise or Google Docs from virtually any computer with a decent Internet connection and modern Web browser.

Just as virtualization has helped improve the flexibility that organizations have in terms of managing computing resource internally, cloud computing adds another level of flexibility in terms of computing resources. While virtualization can help manage an organization's resources more efficiently and provide better quality of service, cloud computing allows an organization to bypass acquiring the infrastructure altogether and simply deploy workloads on someone else's infrastructure. That makes it the provider's problem worrying about capacity and storage planning, hardware lifecycles, and so on.

The flip side is that organizations give up a fair amount of control over their computing and data when dealing with the cloud. You're depending on the cloud provider to keep your data safe, ensure reasonable uptimes, and putting your computing power in someone else's hands. The good news is that you don't have to worry about the specifics of ensuring uptime, but you'll pay the price if the provider doesn't manage to provide uptime — or if there are connectivity issues outside of their control and yours on networks that bridge the cloud services and your organization.

For larger providers like Amazon or Google, downtime is relatively rare. Not unheard of, though. And when the services do fail they tend to be noticed.

Continuity of services goes beyond mere uptime, though. If a SaaS provider changes features, or a PaaS provider changes a programming interface, that can also affect services. When relying on any type of cloud services, it's a good idea to work with the vendor ahead of time to see what their Service Level Agreements are, what remedies you'll have in the event of downtime, and how much notice will be provided before new releases of the software you use will be rolled out. Do you have access to a beta or developer network to test before it goes live? Sufficient notice of changes to ensure that the workforce is appropriately trained for new versions, and so on.

This isn't to say that cloud computing is inherently riskier, but to point out that the risks with cloud computing change slightly.

A Cloud of Your Own

What if you want the benefits of cloud computing and want to retain control of your computing resources?

Lately a few vendors have started offering private cloud and hybrid cloud solutions that let organizations set up cloud solutions internally or bridge their existing infrastructure with other services. Google, for example, provides the Secure Data Connector service that lets organizations store data internally and access it through Google's Apps.

This gives organizations a bit more control over their data, with some of the benefits of cloud computing. Cloud providers have found out pretty quickly that some organizations will happily host some of their data with third parties, but not all of it.

Or organizations can set up a fully private cloud using a solution like the open source Eucalyptus, which offers infrastructure software to set up a cloud environment in a your own data center. Eucalyptus implements Amazon's Web Services specification so customers can even move workloads from Amazon to their own private cloud, or test on an internal cloud and roll out to Amazon when ready. Even though Eucalyptus is a cloud solution, it's possible to deploy Eucalyptus on a single system to test and work with the software — or scale it up and run it on many servers.

The upside to this is greater flexibility in managing workloads, but it still requires your organization to manage hardware and systems that Eucalyptus (or other private cloud solutions) runs on top of. This can be less cost-effective than using external cloud services, but might be a better solution for organizations that want more control over their computing.

Clear Skies

In the final installment to run next week, we'll wrap up by talking about strategies for working with virtualization and cloud computing, and which solutions might be best for specific workloads.

Click Here!