Why the Open Source Cloud Is Important

By

-

December 6, 2016

In previous years, we have distinguished between open source cloud and others. But as cloud technologies have evolved it’s evident that any cloud without open source would be the equivalent of an automobile without an engine.

In 2006, we distinguished heavily between public and private cloud and open source and closed. Today the conversation has evolved into one cloud fabric of which open source has become an integral part.

Perhaps what has most notably changed is that the initial cloud conversations about capex (capital expenditures) versus opex (operating expenditures) and the actual costs to deploy the cloud are now taking into account the advantages of improved agility and customization. Where open source has traditionally sparked interest because of its free nature (as in no acquisition cost) it’s now being lauded for the much harder to measure but much greater benefits of faster speed to value.

We also see an improved return on investment for those companies that participate in open source rather than only consume open source. They need to and are investing in the future direction of the open technology they rely upon actively rather than only being passive and opportunistic.

Industry standards and participation are needed

It would be easy, then, to say that open source has won the cloud. Game over. But along with openness in software, there is an overwhelming need for openness across cloud architectures. And while emerging technologies and trends such as containers have done a lot to improve interoperability among components and ensure application portability, much work remains to ensure the trend toward openness and standardization continues.

To this end, foundations such as the Cloud Foundry Foundation, Cloud Native Computing Foundation (CNCF) and Open Container Initiative (OCI) at The Linux Foundation are actively bringing in new open source projects and engaging member companies to create industry standards for new cloud-native technologies. The goal is to help improve interoperability and create a stable base for container operations on which companies can safely build commercial dependencies.

When work happens in the open, companies that participate are better able to compete in rapidly changing markets and the entire industry benefits from the increased innovation. That also means companies that do not use and participate in open source cloud projects will fall behind. By harnessing the power of shared R&D companies that participate in open source benefit from:

• Improved code quality

• Increased security with the ability to find and fix vulnerabilities

• Visibility into every layer of the infrastructure

• Code access in order to add features and influence the direction of the technology

• Insurance against lock-in through portability to other platforms

• Lower cost through shared development

• And more.

No single company could develop the technologies on this list on their own. Without open source collaboration, the open cloud we know today would not exist.

We urge companies that rely on cloud computing, and the open source technologies that comprise the cloud, to become familiar with and contribute to the projects and communities behind them.

Contributing knowledge and code to open source projects not only helps companies meet their business objectives, but it creates thriving communities that keep projects strong and relevant over time, advances the technology, and benefits the entire open source cloud ecosystem.

Learn more about trends in open source cloud computing and see the full list of the top open source cloud computing projects. Download The Linux Foundation’s Guide to the Open Cloud report today!

Read the other articles in the series:

4 Notable Trends in Open Source Cloud Computing

Trends in the Open Source Cloud: A Shift to Microservices and the Public Cloud

3 Emerging Cloud Technologies You Should Know

How Virtualized Networks Will Save Us From Dropped Calls

By

OPNFV

-

December 6, 2016

We’ve all been the victim of a dropped mobile phone call and know how frustrating it can be. However, virtualized networks provide network operators with powerful tools to detect and recover from network disruptions, or “faults,” that can drop calls for thousands of subscribers simultaneously. The Open Platform for Network Functions Virtualization (OPNFV) project together with OpenStack have developed features in software that add resiliency to mobile networks and enable them to recover from network and other outages.

At the recent OpenStack Summit in Barcelona, both groups demonstrated how new technologies in NFV can help minimize network disruptions. During the keynotes, technical leads from the OPNFV Doctor Project and OpenStack Vitrage project conducted a phone call using a 4G mobile system running on top of OpenStack. The mobile call continued without disruption even after a dramatic cutting of network cables. (You can watch the short demo in its entirety below.)

To get the skinny on how the technology works and what it took to pull off such a compelling demo, we sat down with folks involved with OPNFV, OpenStack and the Doctor project, including Ifat Afek (System Architect at Nokia Cloudband), Carlos Goncalves (Software Specialist at NEC), Ryota Mibu (Assistant Manager at NEC), and Ildiko Vancsa (Ecosystem Technical Lead at OpenStack Foundation).

OPNFV: Can you give an overview of the demo you did at OpenStack Summit?

OPNFV/OpenStack demo team: We performed two live mobile calls from stage and both were interrupted. The first call dropped when Mark Collier (COO at OpenStack Foundation) removed two cables from the servers powering the mobile system for the calls. After this failed call, Ryota Mibu enabled the OPNFV Doctor features and the teams made another call. During the second call, Mark cut the network cables with giant scissors, but this time the call continued without disruption.

The demo leverages OpenStack as the base for a 4G mobile system equipped with the functionality to perform a smooth failover in case of faults in the system (in a process called “Fault Management”). OpenStack laid the foundation for the cloud-based mobile platform and OPNFV—via the Doctor Fault Management project—filled the existing feature gaps and provided system integration. While we successfully showed how OpenStack operates in an NFV/Telecom environment, the demo was also an example of the fruitful collaboration between the OpenStack and OPNFV communities as development of the new features and additions were driven through Doctor “upstream” into OpenStack.

OPNFV: Can you talk a little more about fault management and why it’s important?

Demo team: There is no system without faults, errors, and failures, even in the cloud. Fault management is a component that allows operations teams to monitor, detect, isolate and automate the recovery of faults. With an efficient fault management system, countermeasures can negate the effects of any deployment faults, avoiding bad user experiences or violation of service-level agreements (SLAs).

To put this in perspective, think about the impact to network services during natural disasters or other emergencies. According to a report by NTT DOCOMO, the largest mobile phone operator in Japan, thousands of antennas and other infrastructure equipment went out of service as a result of the magnitude 9.0 earthquake and tsunami in March of 2011. The consequences, as we all know, were devastating. Millions of mobile subscribers were disconnected from the cellular network, unable to make emergency calls or check in with loved ones.

Service continuity of virtualized platforms has to be equally addressed. The features enabled by OPNFV and OpenStack add value toward helping operators quickly recover from small to large-scale faults, ultimately keeping our societies connected in times of need.

OPNFV: How can organizations implement Doctor’s Fault Management solution in their networks?

Demo team: While not standalone software that can be downloaded and installed directly, the core Doctor framework relies on OpenStack components. Any organization deploying recent versions of OpenStack (from Liberty onward) will have Doctor-prescribed enhancements already available out-of-the-box with little to no configuration. In other words, Doctor is now a part of OpenStack.

Extensive documentation covering requirements, use cases, gap analysis, architecture, design decisions, configuration and user guides are available. Head to OPNFV.org to the OPNFV Colorado 2.0 Doctor documentation page for details.

OPNFV: Are there other use cases for Doctor that go beyond telecom? Will it work with other types of networks?

Demo team: Yes, definitely! There are a number of interesting cloud and enterprise applications that can use the framework; for example, those with time constraints, e.g. in the area of multimedia and real-time applications (for faster replacement of a video cache associated with peak user times). The OpenStack-powered fault management framework will be useful for anyone operating within contracted SLAs.

Individually developed features can also be used beyond fault management scenarios. For example, event alarms can be leveraged for quicker triggering of administrative actions. Without this feature, events (or “faults”) can only be retrieved by periodically polling data from a database. In fact before Doctor, the time required to detect and recover from a fault was a few minutes. With Doctor, the time to recovery is less than one second!

OPNFV: What’s next for the Doctor project? Are there other cool implementations we can expect to see in 2017?

Demo team: We certainly hope so, but it will be hard to top our Barcelona demo! As a project and a part of a larger community, maintenance and continuous improvements to the functionality of fault monitoring, notification and handling are needed and planned for in OpenStack. And as integrators, the community needs rich monitoring functions that can be supported by the broader OpenStack/OPNFV ecosystem.

Recently, new open source communities have surfaced that aim to develop higher-layer network function management and orchestration systems. OPNFV has been supportive of these activities, and a plan to integrate them in the platform is on the horizon. That said, we may see Doctor joining additional collaborative efforts at some point.

OPNFV: Most importantly: How did Mark get those giant scissors through airport security?

Demo team: Mark made all of us sign a nondisclosure agreement that prevents us from sharing any details! (It was either that or he would sabotage the demo…)

For more details, please visit OPNFV and OpenStack NFV on the web or follow @opnfv on Twitter.

Status of Embedded Linux: Tim Bird Warns of Slow Progress on Linux Shrinkage

By

Eric Brown

-

December 6, 2016

As Chair of the Architecture Group of The Linux Foundation’s CE Working Group, Tim Bird has long been the amiable public face of the Embedded Linux Conferences, which he has run for over a decade. At the recent ELC Europe event in Berlin, Bird gave a “Status of Embedded Linux” keynote in which he discussed the good news in areas like GPU support and virtually mapped kernel stacks, as well as the slow progress in boot time, system size, and other areas that might help Linux compete with RTOSes in IoT leaf nodes.

Bird also opened ELCE with welcoming remarks and closed it with a Closing Game trivia show. Did you know that Linus Torvalds was once bitten by a penguin, or that his father was a member of the European parliament? Or that Linux has not yet made it to the surface of Mars? Now you do. (See the video below.)

Bird launched his talk by noting the improving cadence consistency of kernel releases, now running between 63 and 70 days. More good news: When Greg-Kroah Hartman, who Bird interviewed in an ELCE fireside chat, announced the next LTS release in advance for the first time, developers restrained themselves from rushing to cram patches into it. Kernel v4.9 LTS is due in early December.

Indeed, Linux has matured, as befits an OS that by some counts has been injected into 1.5 billion objects. Bird thinks it may actually be more than 2 billion by now, although nobody knows for sure.

In any case, the status of embedded Linux is “great,” says Bird. That doesn’t stop him from worrying about the future. “Everyone knows IoT gateways are going to run Linux, but I worry that Linux is not going to run on those 9 billion leaf nodes they’re expecting for IoT,” he said. “I worry that Linux won’t be the first OS running Minecraft on a cereal box.”

To achieve the tuxified cereal box of his dreams, Bird estimates that costs must be reduced to $1.10. Half of that would go to the display, while 40 percent would be consumed by CPU, RAM, and flash. The rest would cover a battery and input device. “Today we’re still at $5 for CPU and memory alone,” he noted.

The point is that RTOSes are likely to get there first. Bird, who is a Senior Software Engineer at Sony, mentioned a recent Sony audio player project in which Nuttx beat out Linux because “it’s easier to add stuff to Nuttx than to trim down Linux.”

Even if Linux may not beat RTOSes to the cereal box market, it troubles Bird that Linux is not being more aggressively extended to capture more of the IoT endpoint market. “There’s been a ton of driver work on CPUs, GPUs, and embedded devices in this year’s kernels, which is great,” said Bird, “but not much on features like boot time, system size, or embedded filesystems.”

Bird noted a decrease in both kernel submissions and ELC and ELCE talks on topics that dominated the first few years of ELC: boot time, system size, file systems, power management, real-time, and security. While numerous lightweight Linux distros have emerged for IoT, there has been little progress on the kernel side to reduce footprint. Not much is going on with the Linux Kernel Tinification or Linux Tiny projects. “We haven’t seen much new since Linux 4.1 when they got rid of users and groups, saving about 25K.”

If embedded open source development continues to expand beyond Linux to RTOSes like Nuttx, FreeRTOS, Mbed, and the Linux Foundation’s Zephyr, fragmentation will only increase, argued Bird. “We already have way too many embedded Linux distributions. That makes it hard to share non-kernel stuff like system-wide and feedback-directed optimizations or security enhancements. We need to find ways to share our package management and our test capabilities.” Bird is hardly anti-RTOS however, stating: “It’s a really big deal that Linaro announced support for Zephyr.”

As IoT developers increasingly work with both Linux and RTOSes, there are not only more technologies to integrate, but also the permissive non-GPL licenses such as BSD that are increasingly used by RTOSes. “We have too many OSes with different licenses,” said Bird, before recommending one admittedly controversial response: dual-licensing code for GPL and BSD.

Generalization vs. specialization

All these issues are played out against a struggle in embedded Linux between generalization and specialization. “In open source we want to generalize to get the network effects of a big group of collaborators,” said Bird. “The device tree is moving the kernel toward greater generalization, with drivers written to handle all possible IP block configurations across multiple CPUs. But in embedded we want to specialize and make our devices as efficient, power-light, and cost effective as possible. But then you lose that community effect.”

This tension has limited the progress of technologies like faster boot times and smaller footprints. “We can do fast boot, but most techniques use kernel specializations that are rejected upstream. Boot time is unique per platform, and reductions tend not to be mainlinable.” The problem is that improvements like fast boot and Linux tinification require subtractive vs. additive engineering. “If you try to rip Linux apart, you end up with Franken-Linux. You can’t pull the pieces apart cleanly.”

Bird has recently been deeply involved in testing automation, where he is leading an LTSI project called Fuego. “Every company builds up their own test, which leads to fragmentation,” said Bird. “Testing automation could help make up for some of the loss of community involved with specialized software. We need to share not only test packages but test experience.”

Staying true to open source principles can help solve these challenges, said Bird. “Look at other projects to find commonality. Find a way to share ideas at a minimum and code if you can. And keep working on upstreaming.”

Highlights of Embedded Linux 2016

In addition to grappling with the big picture, Bird gave a detailed breakdown of embedded Linux progress in specific segments. He also summarized the embedded highlights of recent kernel releases, such as LightNVM in Linux 4.4, ARM multiplatform support in Linux 4.5, and a timer wheel update in Linux 4.8.

Linux 4.9 will bring a technology called virtually mapped kernel stacks, which helps detect stack overruns, clean up kernel code, and speed process creation. “Being able to catch stack overflows inside the kernel is a huge deal,” said Bird. “It’s a level of robustness we’ve never seen before.”

Here’s a run-through of some 2016 embedded Linux trends highlighted by Bird, both inside and outside the kernel project:

Boot-up Time – As noted, not much is shaking. Intel’s XIP (eXecute-In-Place) for x86 “was welcome,” but “asynchronous probing didn’t really go anywhere.”

Device Trees – “Overlays seem to be working as intended,” but validation is stalled. Updating the device tree spec is under discussion.

Graphics – Vulkan API v1.0 from Khronos Group provided a welcome alternative to Direct3D or OpenGL with less CPU and GPU overhead. “AMD plans to open source the driver, and Intel and Valve are already working on it. Nvidia supports it.” The bad news: Qt changed its license from LGPL 2.0 to 3.0, which is “undesirable for many consumer electronics products,” said Bird. “It has a lot of people in our industry worried.”

GPUs – “Freedreno (Adreno) and Etnaviv (Vivante) have really made progress with free drivers.” There’s also been work on the Raspberry Pi’s Broadcom VC4 GPU. The bad news: there’s nothing new from the Lima project (ARM Mali), and nothing yet on the PowerVR front.

File systems – To address the trend toward opaque “black-box” block-based storage used in eMMC, solutions have emerged like LightNVM, a framework for holding SSD parameters. LightNVM allows the kernel to “move the flash translation layer from the black-box hardware up into the software where you have visibility.” Also: “Free Electrons is doing some good work on UBIFS handling of MLC NAND.”

Networking – Bluetooth 4.2 support added better security, faster speed, and 6LoWPAN mesh networking integration. There has also been work on IoT protocols like Thread.

Real-time Linux – The latest RT-preempt was released with Linux 4.8, and “Thomas Gleixner says there are only 10K lines left. I think it will be more than that.” Also, Xenomai 3.0.1 has arrived with a new Cobalt core.

Security – Not much transpired in 2016. However: “A new kernel security hardening project is addressing classes of problems instead of individual bugfixes.”

System Size – Not much happened in 2016, although the XIP patches helped out here as well. Going forward: “Nicolas Pitre is doing some interesting work on gcc –gc-sections, and Vitaly Wool is working on stuff.”

Testing – There has been plenty of work done on Kselftest, LAVA V2, Fuego, and Kernelci.org, which Bird calls “the most successful, public, distributed Linux test system in the world.”

Toolchain – “Khem Raj is doing interesting work in Yocto Project for Clang.”

Tracing – eBPF is being used for dynamic tracing, and there’s a new tracefs filesystem, which is no longer part of debugfs. There’s also been work on Ftrace histogram triggers.

For more information, watch the complete video below.

https://www.youtube.com/watch?v=iRrZVWVL_KE?list=PLbzoR-pLrL6pRFP6SOywVJWdEHlmQE51q

Embedded Linux Conference + OpenIoT Summit North America will be held on February 21 – 23, 2017 in Portland, Oregon. Check out over 130 sessions on the Linux kernel, embedded development & systems, and the latest on the open Internet of Things.

Linux.com readers can register now with the discount code, LINUXRD5, for 5% off the attendee registration price. Register now>>

Docker CEO: Docker Already Is a Security Platform (with Swarm, That Is)

By

The New Stack

-

December 6, 2016

In a reinforcement of his company’s marketing message that containerization as an architecture is more secure by design, Docker Inc. CEO Ben Golub [pictured right above, with HPE Executive VP Antonio Neri] told attendees at HPE’s Discover London 2016 event last Tuesday morning that the Docker platform addresses and ameliorates its users’ security concerns just by its very architecture.

“What we’ve heard from the most security-conscious organizations on the planet who are using Docker, is that they’re using Docker not in spite of security concerns, but in order to address the security concerns,” Golub told attendees.

Why Is C Programming Language Continuously Going Down?

By

FossBytes

-

December 6, 2016

C has ruled the programming world for a long period, becoming the base of many operating systems and programs. However, over the course of past one year, its popularity has fallen, probably, due to lack of any corporate sponsor and increase in the usage of newer languages.

C is a general-purpose programming language that was developed by Dennis M. Ritchie in 1972 at the Bell Telephone Laboratories. It was then used to develop the Unix. Since then, it has laid the foundation of many other operating systems and popular computer programs….

Google DeepMind Makes AI Training Platform Publicly Available

By

Bloomberg Technology

-

December 6, 2016

Alphabet Inc.’s artificial intelligence division Google DeepMind is making the maze-like game platform it uses for many of its experiments available to other researchers and the general public.

DeepMind is putting the entire source code for its training environment — which it previously called Labyrinth and has now renamed as DeepMind Lab — on the open-source depository GitHub, the company said Monday. Anyone will be able to download the code and customize it to help train their own artificial intelligence systems. They will also be able to create new game levels for DeepMind Lab and upload these to GitHub.

Eight Great Linux Gifts for the Holiday Season

By

ZDNet

-

December 6, 2016

Do you want to give your techie friend a very Linux holiday season? Sure you do! Here are some suggestions to brighten your favorite Tux fan’s day.

1) Tux

Every Linux fan should have at least one stuffed Tux, Linux’s mascot, in their home or office. Tux stuffies aren’t as common as they once were, but Linux PC vendor ZaReason still has a very nice snuggling Tux…

Kubernetes High Availability Setup Using Ansible

By

Pawan Kamboj

-

December 6, 2016

I have created an Ansible module to create a highly available (HA) Kubernetes cluster with latest release 1.4.x on CentOS 7.X.

You can use this module to install Kubernetes HA cluster with just one click, and your cluster will be ready in few minutes.

There are 8 roles defined in this Ansible module.

addon – Use this role to create Kubernetes addon service like, kube-proxy, kube-dns, kube-dashboard, weavnet, weavescope-ui and grafana/infuxdb. This role should be called after the cluster is fully operational.
docker – Use this role to install later Docker version. It will install Docker on all cluster nodes, as Docker is required for all Kubernetes cluster members.
etcd – This role installs etcd cluster. Both secure and unsecure clusters are supported in it; choose whatever you want to install.
haproxy – This is Haproxy LB setup for Kubernetes api service, use it if you don’t have any other LB not available. For single node cluster it is not required.
master – Use this role to set up Kubernetes master services — kube-apiserver, kube-controller and kube-scheduler. All these services will run as pods on all master nodes. Both controller and scheduler are configured in HA mode.
node – This role installs kubelet on all cluster nodes and also creates required SSL certificate to communicate to master components.
sslcert – Creates all SSL certificates required to run secure K8S cluster. It creates certificate for api service, etcd, and admin account.
yum-repo – This role installs eple and kubernetes-1.4 package repo on all Kubernetes servers.

Follow the below steps to create Kubernetes HA setup on CentOS-7.

Prerequisites:

Ansible
All Kubernetes master/node should have password-less access from Ansible host

Download the Kubernetes-Ansible module from the following git-hub location:

https://github.com/pawankkamboj/HA-kubernetes-ansible

Set up variable according to requirement in group variable file all.yml and add host in inventory file.

Run cluster.yml playbook to create Kubernetes HA cluster.

For example — if we have two master servers, then it will deploy api, controller, scheduler service on all these in HA mode. Controller and Scheduler can be run in HA mode using the –leader-elect option, but to run API in HA, we need Load balancer and so that api traffic forwards to api servers.

Note – Addon roles should be run after cluster is fully operational.

Essentials of OpenStack Administration Part 1: Cloud Fundamentals

By

Linux Training Staff

-

December 5, 2016

Start exploring Essentials of OpenStack Administration by downloading the free sample chapter today. Download Now

OpenStack and cloud computing is a way of automating and virtualizing a traditional data center that allows for a single point of control and a single view of what resources are being used.

Cloud computing is an important part of today’s data center and having skills to deploy, work with, and troubleshoot a cloud are essential for sysadmins today.

Some 51 percent of hiring managers say experience with or knowledge of OpenStack and CloudStack are driving open source hiring decisions, according to the Open Source Jobs Report from The Linux Foundation and Dice.

The Linux Foundation’s online Essentials of OpenStack Administration course teaches everything you need to know to create and manage private and public clouds with OpenStack. In this tutorial series, we’ll give you a sneak preview of the second session in the course on Cloud Fundamentals. Or you can download the entire chapter now.

The series covers the basic tenets of cloud computing and takes a high-level look at the architecture. You’ll also learn the history of OpenStack and compare cloud computing to a conventional data center.

By the end of the tutorial series, you should be able to:

• Understand the solutions OpenStack provides

• Differentiate between conventional and cloud data center deployments

• Explain the federated nature of OpenStack projects

In part 1, we’ll define cloud computing and discuss different cloud services models and the needs of users and platform providers.

What is cloud computing?

Cloud Computing is a blanket term that may mean different things in different contexts. For example, in science it refers simply to distributed computing, where you run an application simultaneously on two or more connected computers. However, in common usage it might refer to anything from the Internet itself to a certain class of services offered by a single company.

Users and platform providers typically mean different things when they discuss the cloud. Users think of a place on the Internet where they can upload things. For platform providers, clouds are infrastructure projects that allow data centers to be much more efficient than they were previously. The latter is the focus of the Essentials of OpenStack Administration class.

You may have also heard of the following terms:

• Infrastructure as a Service (IaaS)

• Platform as a Service (PaaS)

• Software as a Service (SaaS)

The three terms refer to three common service models offered by cloud vendors such as Amazon or Rackspace, where IaaS is the most basic but flexible one, and the others progressively mask the “dirty details” from the user, trading flexibility for ease-of-use.

Platform Services

Platform Providers have goals when providing IT services, such as:

• Delivering excellent customer service.

• Providing a flexible and cost-efficient infrastructure.

If a provider fails to deliver excellent customer service, customers will look for alternatives. Cost-efficiency is always the bottom line. No one wants to spend millions on infrastructure that is static.

Infrastructure service customers will also have some requirements of their own:

• Stability, reliability, flexibility of the service…

• … for as little money as possible.

The phrase “wire once, deploy many” sums up the goal of an infrastructure provider. From the customer perspective, all of the various components are presented through an easy-to-use software interface. The use of this interface allows the customer to start new virtual machines, attach storage, attach network resources, and shut the instances down, all without having to open a ticket. This allows for more flexibility for the customer. The infrastructure provider can then focus on providing good customer service, lowering costs through consolidation and on meeting the ongoing resource requirements of one or more customers.

Catering to Both Providers and Customers

As you can see, both platform providers and their customers have very similar requirements. The key to cater to both is automation: it facilitates both flexibility and cost-effectiveness. We will get into a lot more detail on it later on.

In Part 2 of this series, we’ll see what conventional, un-automated infrastructure offerings look like, and Part 3 looks at existing cloud solutions.

Read the other parts of this series:

Essentials of OpenStack Administration Part 4: Cloud Design, Software-Defined Networking and Storage

Essentials of OpenStack Administration Part 5: OpenStack Releases and Use Cases

The Essentials of OpenStack Administration course teaches you everything you need to know to create and manage private and public clouds with OpenStack. Download a sample chapter today!

PayPal Cuts Costs 10x With Open Source CI

By

Carla Schroder

-

December 5, 2016

The bigger you are, the more small efficiencies add up. Manivannan Selvaraj’s talk from LinuxCon North America gives us a detailed inside view of how PayPal cut operating costs by a factor of ten, while greatly increasing performance and user convenience.

Everything has to be fast now. We can’t have downtimes. No going offline for maintenance, no requesting resources with a days-long ticketing process. Once upon a time virtual machines were the new miracle technology that enabled more efficient resource use. But that was then. Selvaraj describes how PayPal’s VMs were operating at low efficiency. They started with a single giant customized Jenkins instance running over 40,000 jobs. It was a single point of failure, not scalable, and inflexible.

The next iteration was individual VMs running Jenkins for each application, which was great for users, but still not an optimal use of hardware. Selvaraj notes that, “Only 10% were really used. The rest of the time, the resources were idle and if you think about 2,500 virtual machines, it’s millions of dollars invested in hardware. So, although it solved the problem of freedom for users and removed the single point of failure, we still had the resource management issue where we didn’t use the resource optimally.”

Docker is the key

The solution was a continuous integration (CI) system built on Git, Docker, Mesos, Jenkins, Aurora, and the Travis CI API. Docker is the key to making it all work the way they want. Selvaraj explains how Docker provides five key benefits: task isolation, eliminates host dependency, reproducibility, portability, and cloud native.

Selvaraj says, “Once we decided that Docker is the way to go, we started dockerizing most of our applications. We have dockerized CI API, which is our orchestration engine, which takes in the CI provisioning request, creates the CI to the user. We have dockerized the Jenkins master. We have dockerized Jenkins slaves. So, everything is running in Jenkins, in Docker, so that we don’t really rely anything on the host and it’s very easy from our maintenance perspective.”

Selvaraj shares a wealth of great insights on PayPal’s CI infrastructure in the conference video (below) and gives a live demonstration.