Today is the 11th Annual SysAdmin Appreciation Day! In celebration, Linux.com is launching a new series titled "SysAdmin Toolbox" aimed at providing SysAdmins with all the resources needed to do their important work. This series consists of special contributions from experts throughout the IT industry. Zenoss' Mark R. Hinkle is our first contributor and shares with us a comprehensive review of Open Source Toolchains for Linux SysAdmins. Mark is also speaking at LinuxCon about using open source toolchains for cloud computing. More information about his session can be found here.
Open Source Toolchains for Linux Systems Administrators
Two of the most notable trends in systems management are DevOps and the related and partially redundant Agile Operations movement. These initiatives are popular in many Web 2.0 and cloud computing oriented companies like Twitter, Google, Yahoo! and Facebook where the companies’ products are highly dependent on IT. Though in reality the same practices are just as well-suited to the IT administrator in the traditional organization with massive infrastructure, unrealistic workloads and businesses that needs to improve efficiency to meet their business goals.
DevOps transcends the silos of IT responsibilities and encourages operations personnel to jointly plan product delivery with developers. In addition, the evolving the role of the systems administrator is no longer to be a reactive maintainer of infrastructure trudging through repetitive systems building tasks, but rather an engineer who designs and builds systems so they are highly available. In the past, systems administrators might have captured their domain knowledge in a collection of random scripts; but today’s savvy systems engineer is coding their infrastructure and making sure that code is understood and institutionalized throughout their organization.
According to Red Monk analyst Michael Coté, ”Much of the emphasis on virtualization and cloud computing has been on optimizing the infrastructure, making the lives of IT admins easier and IT more affordable for companies. Building on that, the idea of DevOps is to apply cloud and cloud-inspired technologies to improve the application delivery process. The goal is to inject the spirit of Agile software development and frequent functionality delivery into the entire life of the application, not just the coding. That small goal requires a tremendous amount of technology and cultural refactoring, but the goal of making the end-user's life a little better (and worth paying more for, however "payment" is collected) is certainly worth it.”
You could probably ask three different practitioners familiar with the terms how they define them and get three slightly different answers. For the purposes of this conversation DevOps and Agile operations aren’t defined as a technology changes but rather professional and culture changes. They redefine the role of the IT management professional from someone who maintains systems and IT infrastructure (the traditional systems administrator) to someone who manages and defines replicable, resilient, highly available IT systems (the systems engineer).
Accompanying this change are some tactical improvements designed to meet the goals of higher availability and improved effectiveness. While not an exhaustive list, the following practices are commonly used by agile systems engineers.
• Automated Infrastructure – Rather than executing recurring tasks, administrators create mechanized processes that formalize them using tools that can be used to generate consistent results and be shared among other members of their team.
• Server Version Control - Changes aren’t taking place on the server; they are made in a central repository and pushed out to servers so that there is a process and a rollback mechanism in the case of an error.
• Frequent Improvements and Releases/Updates – The old way of thinking is to change servers and other infrastructure infrequently and only in very narrow maintenance windows to minimize risk. Agile operations people make small changes frequently so systems are constantly improving. And in the event there is a problem, they can track the change that is adversely affecting their infrastructure.
To accomplish these goals the systems engineer needs an updated bag of tools. Luckily the open source management community has produced numerous tools that lend themselves to achieving these results.
Open Source Tool Chains
Software developers are very familiar with toolchains, series of programs where the output of one program forms the input for the next. A free software example would be using the GNU Emacs editor, the GNU bin-utils and the GNU Compiler Collection (GCC) to write a program. Software developers frequently create programs and subroutines that are used in other programs rather than recoding the same process over and over again.
Toolchains for systems administrators have historically come in inflexible form from large commercial software vendors. Within the nascent DevOps and Agile Operations movements, communities of individual practitioners are forming to help define their own toolchains from their favorite tool choices. One such community is the DevOps Toolchain Project.
“The DevOps-toolchain project is about providing a forum for developers and administrators to document success stories and lessons learned” says Alex Honor, one of the project’s founding members. “The focus thus far has been on practices and tooling to support provisioning, monitoring and operations in large scale fast moving environments”
Just as software developers have different task-specific tools to make up software tool chains (e.g. editors, compilers, build scripts), systems administrators can use tool chains made up of tools used to automate management functions and maintenance of Linux servers. They can be classified into three broad categories: Provisioning, Configuration Management and Automation and Monitoring.
Provisioning tools automate the installation of packages on the Linux server. They leverage the package systems on the server like rpm or apt to install packages. Some even do some cursory configuration. Configuration management and automation are used to set parameters or start services on a newly provisioned server. They can also be used to restore a system to a previous state when it has encountered an error. Monitoring tools collect data about servers and produce reports on availability, performance and other system stats.
Figure 1.1 Examples of Open Source Tools Well-Suited for Open Source Toolchains
Integrating Systems Management Tools
Open source toolchains are a boon in maintaining an aggressive service level, since automation, when done correctly, is much faster and more effective than then manually fixing the problem. For example, if you wanted to maintain a very highly available service level of Five-Nines, defined as a system that has 99.999% uptime, there is very little margin for error. Over the course of a year that only allows for 5 minutes and 15 seconds of downtime before that service level is compromised. That is hardly enough time for an administrator to receive and acknowledge a page, let alone log in to a server and diagnose a problem. Besides being reactive in fixing problems by monitoring and managing the service, you can also provide a mechanism to prevent outages from happening.
A good place to start when building a tool chain is with the tools that can automate the building of a server. Using tools like this speed the time of deployment and make it possible to deploy a large number of servers in a short amount of time and make the build process easily repeatable. Or in the case of a severe failure, you can rebuild infrastructure.
Early on, Linux users may have piped a list of packages to rpm to install software, a very simple tool chain. Later the process was improved using Kickstart to execute unattended Linux installs. Now Cobbler has taken that functionality to the next level by providing a way to parallelize the building of systems, configuring DHCP or DNS for both physical and virtual machines.
Cobbler also integrates with other tools like the configuration management and automation tool, Puppet. After the software is installed, the services can be updated and readied for service. With the addition of automation tools like ControlTier services can be restarted so configuration changes can take effect. Closing the loop for deployment of a Linux server.
PuppetLabs founder, Luke Kanies noted, "Companies of all sizes are using Puppet to manage machines from initial installation to end of life, completely avoiding manual interaction when building servers or deploying new application versions.”
Another example of an open source tool chain is the integrated provisioning and configuration and automation tools used in the Red Hat sponsored Genome project. Genome is a set of tools that allows users to maintain cloud-based infrastructure. A use case for Genome would be to deploy a multi-tiered web application that includes an Apache reverse proxy tier, to a JBoss application server tier connecting to replicated PostgresSQL databases.
Figure 1.2 Genome Architecture
Most of the tools mentioned so far are active; they make changes and do work. However, they often lack the information about current state of a system, that’s where monitoring comes in. The role of monitoring for traditional systems administrator is to alert them when a fault occurs, usually via a page or an email. However, monitoring tools (e.g. Nagios, OpenNMS and Zenoss Core) can and should do much more by providing insight into the performance and capacity of servers. This information can be used to inform the actions of these active tools. Some of these tools even provide interfaces to kick off process in other tools. For example, Zenoss Core can, based on a monitoring event, reconfigure a service through Cfengine, Chef or Puppet.
An operational demonstration of this type of leverage was given at the O’Reilly Velocity Conference, called DevOps GameDay. The exercise consisted of a scenario were a web application was hosted in both East and West coast datacenters of Amazon’s Elastic Compute cloud. The administrators caused a server failure for some of the servers in the West coast data center. The infrastructure was monitored with Zenoss Core, which captured the failure and notified OpsCode’s Chef to take action. Chef then updated the Dynect Platform’s service via the Dynect API to reroute traffic to the East coast facility. Once the new servers went live, Chef pushed the new instances into Zenoss, which started monitoring them in the remediated architecture.
Figure 1.3 Sample Web Application Failover Toolchain
This demonstration was conducted as an example of how infrastructure automation could be used to improve recovery of the systems. In this case, the system had built-in redundancy with multiple web servers and DNS load balancing. However, when failure was introduced into the system the infrastructure was able to automatically recover in less than 90 seconds. Also, this was a simple demonstration but the same design points could be applied using other tools and other use cases.
For years proprietary software management vendors have tried to provide full server lifecycle management via broad management suites. These products are typically comprised of technology developed in a secretive environment and then supplemented by acquiring technology from smaller companies. They are cobbled together in a way that allows them to sell to a broad audience with long lists of features that often are long on promises and short on results. They also lack input from users until after the product is brought to market and, even then, those users have to pay significantly to use that product forming small exclusive communities of users.
Open source toolchains have a lot of advantages for managing infrastructure. First, they provide a choice in what elements a user wants to use. Second, technology is developed in the open and input is accepted from the end-users as part of the software development process better informing the feature set. Third, open source tools foster innovation and are inclusive, encouraging participation since they are free with little barrier to use them and contribute to their development. Finally, the culture of open source software favors open standards, interoperability and open APIs which makes it easy to integrate.
According to John M. Willis, VP of Services at Opscode, ”Managing infrastructure in a cloudy world can no longer rely on slow paced non-integrated legacy software. At Opscode we view open source, automated infrastructure and systems integration as the last mile of highly scalable systems. Today’s infrastructures need to be managed with speed, frequency and agility. Relying on non-integrated closed solutions can’t keep pace with the changing world. Open API’s and single touch transactions is the new standard for highly scalable operations.”
Regardless of your infrastructure there is probably an advantage to be had by creating your own toolchain with open source tools.
By Mark R. Hinkle
VP of Community, Zenoss