Not too long ago, software development was done a little differently. We programmers would each have our own computer, and we would write code that did the usual things a program should do, such as read and write files, respond to user events, save data to a database, and so on. Most of the code ran on a single computer, except for the database server, which was usually a separate computer. To interact with the database, our code would specify the name or address of the database server along with credentials and other information, and we would call into a library that would do the hard work of communicating with the server. So, from the perspective of the code, everything took place locally. We would call a function to get data from a table, and the function would return with the data we asked for. Yes, there were plenty of exceptions, but for many application-based desktop applications, this was the general picture.
But, there's something missing from this picture: The hardware. The servers. That's because our software was pretty straightforward. We would write a program and expect that there’d be enough memory and disk space for the program to run (and issue an error message if there wasn’t). Of course, larger corporations and high-tech organizations always had more going on in terms of servers, but even then, software was rarely distributed, even in the case of central servers. If the server went down, we were hosed.
A Nightmare Was Brewing
This made for a lot of nightmares. The Quality Assurance (QA) team needed fresh computers to install the software on, and it was often a job that both the developer and the tester would do together. And, if the developer needed to run some special tests, he or she would ask a member of the IT staff to find a free computer. Then, he or she would walk to the freezing cold computer room and work in there for a bit trying to get the software up and running. Throughout all this, there was a divide between groups. There were the programmers writing code, and there were the IT people maintaining the hardware. There were database people and other groups. And each group was separate. But the IT people were at the center of it all.
Today software is different. Several years ago, somebody realized that a good way to keep a website going is to create copies of the servers running the website code. Then if one goes down, users could be routed to another server. But, this approach required changes in how we wrote our code. We couldn't just maintain a user's login information on a single computer unless we want to force the user to log back in after the one server died and another took over. So we had to adjust our code for this and similar situations.
Gradually our software grew in size as well. Some of our work has moved to other servers. Now we're dealing not only with servers that are copies of each other (replicated), but we're dealing with software and programs that are distributed among multiple computers. And our code has to be able to handle this. That part I said earlier regarding the time spent in the refrigerated data room just trying to get the software installed is still an issue with this distributed and replicated architecture. But now it's much harder. You can no longer just request a spare PC to go test the software on. And QA staff can no longer just wipe a single computer and reinstall the software from a DVD. Just the installation alone is a headache. What external modules does your app need? And how is it distributed among hardware? And then, exactly what hardware is needed?
This situation requires the different groups to work closely together. The IT team who manages the hardware can't be expected to just know what the developer's software needs. And the developer can't be expected to automatically know what hardware is available and how to make use of it.
DevOps to the Rescue
Thus we have a new field where the overlap occurs, which is a combination of developer and operations, called DevOps (see Figure 1 above). This is a field both developers and IT people need to know. But let's focus today on the developers.
Suppose your program needs to spawn a process that does some special number crunching that would be well-suited on four separate machines, each with 16 cores, with the code distributed among those 64 cores. And when you have the code written, how will you try out your code?
The answer is in virtualization. With a cloud infrastructure, you can easily provision the hardware that you need, install the virtual operating systems on the virtual servers, upload your code, and have at it. Then when you're finished working, you can shut down the virtual machines, where the resources return to a pool for other use by other people. That process works for your testing, but in a live environment, your code might need to do the work itself of provisioning the virtual servers and uploading the code. Thus, your code must now be more aware of the hardware architecture.
Developers must know DevOps -- in the areas of virtualization and cloud technology, as well as hardware management and configuration. Most organizations have limited IT people and they can't sit beside the developers and help out. And managing the hardware from within the program requires coding, which is what the developers are there for. The line between hardware and software is more blurry than it used to be.
What to Learn
So where can you, as a programmer, learn the skills of DevOps? The usual places online are great places to start (our own Linux.com and various sites).
As for what to learn, here are some starters:
Learn what virtualization is and how, through software alone, you can provision a virtual computer and install an operating system and a software stack. A great way to learn this is by opening an account on Amazon Web Services and play around with their EC2 technology. Also, learn why these virtual servers are quite different from the early attempts, whereby one single-core computer would try to run multiple operating systems simultaneously, causing a seriously slow system. Today's virtualization uses different approaches so this isn't a problem, especially since multi-core computers are mainstream. Learn how block storage devices are used in a virtual setting.
Learn about some of the new open source standards such as OpenStack. OpenStack is a framework that works similarly to the way you can provision hardware on Amazon Web Services.
Learn network virtualization (Figure 2). This topic alone can become a career, but the idea is that you have all these virtual servers sitting on physical servers; while those physical servers are interconnected through physical networks, you can actually create a virtual network whereby your virtual servers connect in other ways, using a separate set of IP addresses in what's called a virtual private network. That way you can, for example, block a database server from being accessed from the outside world, while the virtual web servers able to access it within the private network.
Now learn how to manage all this, first through a web console, and then through programming, including with code that uses a RESTful API. And, while you’re there, learn about the security concerns and how to write code that uses OAuth2 and other forms of authentication and authorization. Learn, learn, learn as much as you can about how to configure ssh.
Learn some configuration management tools. Chef and Puppet are two of the most popular. Learn how to write code in both of these tools, and learn how you can access that code from your own code.
The days of being in separate groups are gone. We have to get along and learn more about each other's fields. Fifteen years ago, I never imagined I would become an expert at installing Linux and configuring ssh. But now I need this as part of my software development job, because I'm writing distributed cloud-based software. It's now a job requirement and yes, we can all just get along.