Everyone uses %CPU to measure performance, but everyone is wrong, says Netflix’s Brendan Gregg in his UpSCALE Lightning Talk.
CPU utilization is the metric everyone uses to measure a processor’s performance. But %CPU is a misleading measure of how busy your processor really is, says Brendan Gregg, senior performance architect at Netflix, in what he calls a “five-minute public service announcement,” at the 16th annual Southern California Linux Expo (SCALE).
Watch Brendan’s talk to learn how you can use Netflix’s methods to determine what your CPUs are really doing to impact performance.
There’s a nifty feature that comes with VirtualBox that allows you to create a desktop shortcut for virtual machines. This, however, doesn’t do you much good if you’re running VirtualBox on a GUI-less server. If that’s the case, you don’t always want to have to issue the command to start a VM every time it’s needed. To that end, what do you do? If you’re using Linux to host VirtualBox VMs, it’s really quite simple—you create bash scripts to manage the starting, stopping, and resuming of those virtual machines.
I’m going to show you how to do just that. I’ll assume you already have VirtualBox installed along with all the virtual machines you need. With that said, let’s see how this is done.
As part of preparing my last two talks at LCA on the kernel community, “Burning Down the Castle”and “Maintainers Don’t Scale”, I have looked into how the Kernel’s maintainer structure can be measured. One very interesting approach is looking at the pull request flows, for example done in the LWN article “How 4.4’s patches got to the mainline”. Note that in the linux kernel process, pull requests are only used to submit development from entire subsystems, not individual contributions. What I’m trying to work out here isn’t so much the overall patch flow, but focusing on how maintainers work, and how that’s different in different subsystems.
Methodology
In my presentations I claimed that the kernel community is suffering from too steep hierarchies. And worse, the people in power don’t bother to apply the same rules to themselves as anyone else, especially around purported quality enforcement tools like code reviews.
For our purposes a contributor is someone who submits a patch to a mailing list, but needs a maintainer to apply it for them, to get the patch merged. A maintainer on the other hand can directly apply a patch to a subsystem tree, and will then send pull requests up the maintainer hierarchy until the patch lands in Linus’ tree. This is relatively easy to measure accurately in git: If the recorded patch author and committer match, it’s a maintainer self-commit, if they don’t match it’s a contributor commit.
As we move forward, it’s becoming increasingly clear (to me at least) that the future will be containerized and those containers will run on serverless infrastructure.
In this context, then, the obvious question is: “What becomes of orchestration in this serverless future?”
Kubernetes is a technology developed to provide a serverless-experience of running containers. But the truth is that at the low level, the Kubernetes architecture itself is deeply aware individual machines, and components from the scheduler to the controller manager assume that the containers in Kubernetes are living on machines that are visible to Kubernetes….
For these serverless platforms, it might have been tempting to develop an entirely new orchestrator, but the truth is that the world is consolidating around the Kubernetes orchestration API, and the value of seamless integration with existing Kubernetes tooling is very attractive.
In the never-ending quest to do more with less, IT departments are always looking for ways to save money without sacrificing the high availability, performance and security needed in business-critical enterprise applications. When Microsoft began supporting SQL Server on Linux in 2017, many organizations considered migrating to this open source operating system in both private and public clouds. But they quickly discovered that some essential capabilities available in a Windows environment were not yet supported on Linux.
One of the most challenging of these issues involves ensuring high availability with robust replication and automatic failover. Most Linux distributions give IT departments two equally bad choices for high availability: Either pay more for SQL Server Enterprise Edition to implement Always On Availability Groups; or struggle to make complex do-it-yourself HA Linux configurations work well—something that can be extraordinarily difficult to do.
This unsatisfactory situation has given rise to some new, third-party high availability solutions for SQL Server applications running in a Linux environment. But before discussing these new solutions, it is instructive to understand more about the two current choices.
The problem with using Enterprise Edition is rather apparent: It undermines the cost-saving rationale for using open source operating system software on commodity hardware. For a limited number of small SQL Server applications, it might be possible to justify the additional cost. But it’s too expensive for many database applications and will do nothing to provide general-purpose HA for Linux.
Providing HA across all applications running in a Linux environment is possible using open source software, such as Pacemaker and Corosync, or SUSE Linux Enterprise High Availability Extension. But getting the full software stack to work as desired requires creating (and testing) custom scripts for each application, and these scripts often need to be retested and updated after even minor changes are made to any of the software or hardware being used. Availability-related capabilities that are currently unsupported in both SQL Server Standard Edition and Linux can make this effort all the more challenging.
To make HA both cost-effective and easy to implement, the new solutions take two different, general-purpose approaches. One is storage-based systems that protect data by replicating it within a redundant and resilient storage area networks (SANs). This approach is agnostic with respect to host operating system, but it requires that the entire SAN infrastructure be acquired from a single vendor and relies on separate failover provisions to deliver high availability.
The other approach is host-based and involves creating a storage-agnostic SANless cluster across Linux server instances. As an HA overlay, these clusters are capable of operating across both the LAN and WAN in private, public and hybrid clouds. The overlay is also application-agnostic, enabling organizations to have a single, universal HA solution across all applications. While this approach does consume host resources, these are relatively inexpensive and easy to scale in a Linux environment.
Most HA SANless cluster solutions provide a combination of real-time block-level data replication, continuous application monitoring, and configurable failover/failback recovery policies to protect all business-critical applications, including those using Always On Failover Cluster Instances available in the Standard Edition of SQL Server.
Some of the more robust HA SANless cluster solutions also offer advanced capabilities, such as ease of configuration and operation with an intuitive graphical user interface, a choice of synchronous or asynchronous replication, WAN optimization to maximize performance, manual switchover of primary and secondary server assignments for planned maintenance, and performing regular backups without disruption to the application.
A three-node SANless cluster with two concurrent failures
The diagram above shows how a SANless cluster is able to handle two concurrent failures. The basic operation is the same in the LAN and WAN, as well as across private, public and hybrid clouds. Server #1 is initially the primary that replicates data to both servers #2 and #3. It experiences a problem, automatically triggering a failover to server #2, which now becomes the primary replicating data to server #3.
In this situation, the IT department would likely begin diagnosing and repairing whatever problem caused server #1 to fail. Once fixed, it could be restored as the primary or server #2 could continue in that capacity replicating data to both servers #1 and #3. Should server #2 fail before server #1 is returned to operation, a failover would be triggered to server #3.
With most HA SANless clustering solutions failovers are automatic, and both failovers and failbacks can be controlled by a browser-based console. This ability enables a 3-node configuration like this one to be used for maintenance purposes while continuously providing high-availability for the application and its data.
Michael Traudt, SIOS Technology Senior Solutions Architect, brings 16 years of experience in high availability, DR, and backup and recovery. Focused on tier one application use cases he has hands-on experience building environments from the ground up based on needs to demonstrate applied features, run performance and scalability testing, or collect competitive analysis metrics.
Cloud Foundry is large and complex, because that is what happens when we build software to automate tasks we’ve been doing manually. In this series, we are previewing the Cloud Foundry for Developers training course to help you better understand what Cloud Foundry is and how to use it. In case you missed the previous articles, you can catch up here:
Back in the olden days, provisioning and managing IT stacks was complex, time-consuming, and error-prone. Getting the resources to do your job could take weeks or months.
Infrastructure-as-a-Service (IaaS) was the first major step in automating IT stacks, and introduced the self-service provisioning and configuration model. VMware and Amazon were among the largest early developers and service providers.
Platform-as-a-Service (PaaS) adds the layer to IaaS that provides application development and management.
Cloud Foundry is for building Platform as a Service (PaaS) projects, which bundle servers, networks, storage, operating systems, middleware, databases, and development tools into scalable, centrally-managed hardware and software stacks. That is a lot of work to do manually, so it takes a lot of software to automate it.
Cloud Foundry Command-Line Interface
The Cloud Foundry command-line interface (CLI) is the cf command. You run cf on your local machine to log in to remote Cloud Foundry instances and perform operations such as view logs, manage apps, run health checks, manage buildpacks, manage users, manage plugins, and many more. cf is written in Go, and it is extensible via plugins.
The Cloud Controller exposes the REST APIs of Cloud Foundry. This is the endpoint that the the cf command talks to when you interact with a Cloud Foundry instance.
As a developer, one of the first things you will do is push an application to Cloud Foundry using the CLI. The Cloud Controller responds to client requests and then interacts with the appropriate Cloud Foundry components to deploy, run, and manipulate your applications and services. In part five of this series we will learn how to push an application.
The Diego Container Management System is responsible for the lifecycle of applications and tasks. Diego sees applications as long-running processes, like a Java-based online transaction processing (OLTP) application, or a Rails Web application. One-off processes, e.g. a database migration, are called tasks. Both applications and tasks are run inside containers.
Diego contains one or more compute nodes (virtual machines) called cells. Cells run containers, which execute applications and tasks.
The router is responsible for routing traffic into applications and to the cloud controllers. As application instances are spun up and down, die and are recreated, fast automatic updating of route tables is crucial. Your applications might scale up or down. Instances might crash or Diego Cells might go offline. The router ensures traffic is routed appropriately to live, available instances.
Buildpacks manage dependencies, such as Ruby, Node, or Java Runtime Environment (JRE). As a developer, you want your applications to run in a consistent manner. Buildpacks provide this consistency for developers and operators, by centralizing the container and runtime configuration logic, and are extensible and customizable.
Buildpacks prepare applications for execution inside containers in Diego. Applications go through a process called staging. This is where runtime dependencies are added, memory allocations are calculated, and start commands are set. The output is called a droplet, which is a combination of the application and the runtime dependencies.
How do we know what is happening on this complex always-changing platform? We need logs and metrics. Loggregator automatically aggregates your logs for you. Cells collect logs from all application instances and forward them to the Loggregator.
The User Account and Authentication (UAA) service provides fine-grained control for our complex systems. UAA is a OAuth2 provider for securing the Cloud Foundry platform, and for individual applications. It provides enterprise-level authentication features such as single-sign on, or using an LDAP store for user credentials.
Applications often need services like databases, caches and messaging systems. Service brokers provide access to these services in Cloud Foundry through a standardized interface. This interface allows services to be provisioned and consumed using the Cloud Foundry APIs, without knowledge of the underlying service. For example, you can write an application that leverages MySQL without knowing how to deploy or manage MySQL, or even knowing how the broker is provisioning the database.
Service brokers implement an API and are registered with the Cloud Controller. Developers issue service management commands via the Cloud Foundry CLI, which communicates them to the Cloud Controller, which, in turn, invokes the service APIs and provisions, de-provisions, configures, and manages the underlying service.
The Service Broker API provides a key extension point for Cloud Foundry. If you can implement the API, you can expose your services in the platform.
At the recent Embedded Linux Conference + OpenIoT Summit, I sat down with Jonathan Corbet, the founder and editor-in-chief of LWN to discuss a wide range of topics, including the annual Linux kernel report.
The annual Linux Kernel Development Report, released by The Linux Foundation is the evolution of work Corbet and Greg Kroah-Hartman had been doing independently for years. The goal of the report is to document various facets of kernel development, such as who is doing the work, what is the pace of the work, and which companies are supporting the work.
Linux kernel contributors
To learn more about the companies supporting Linux kernel development in particular, Corbet wrote a set of scripts with the release of kernel 2.6.20, to pull the information out of the kernel repository. That information helped Corbet associate contributions with employers, whenever possible.
There’s common wisdom in the open source world: Everybody knows that the documentation is awful, that nobody wants to write it, and that this is just the way things are. But the truth is that there are lots of people who want to write the docs. We just make it too hard for them to participate. So they write articles on Stack Overflow, on their blogs, and on third-party forums. Although this can be good, it’s also a great way for worst-practice solutions to bloom and gain momentum. Embracing these people and making them part of the official documentation effort for your project has many advantages.
Unlike writing fiction, where the prevailing advice is just start writing, when it comes to technical writing, you need to plan a bit. Before you start, there are several questions you should ask.
Who?
The first of these is who?. Who are you writing to? Some professional tech writers create personas so that when they are writing, they can think to themselves, “What would Monica need to know in this situation?” or “What kind of problem is Marcus likely to have around this topic?” and then write accordingly.
Over the last year, I’ve been working on use cases with high-performance computing (HPC) on OpenStack.
In this post, I’ll offer some considerations about hosting high performance and high-throughput workloads.
First, let’s start with the three types of architectures that can be used when hosting HPC workloads on OpenStack:
Virtualized HPC on OpenStack
In this architecture, all components of the HPC cluster are virtualized in OpenStack
Bare-metal HPC on OpenStack
All components of the HPC cluster are deployed in bare metal servers using OpenStack Ironic
Virtualized head node and bare-metal compute nodes
The head node (scheduler, master and login node) are virtualized in OpenStack and the compute nodes are deployed in bare metal servers using OpenStack Ironic
Now that you have an overview of the three types of architecture that can deploy HPC software in OpenStack, I’m going to discuss a few OpenStack best practices when hosting these types of workloads.
While not yet officially announced, the Cinnamon 3.8 desktop environment has been released and it’s already available in the repositories of some popular GNU/Linux distributions, such as Arch Linux.
Scheduled to ship with the upcoming Linux Mint 19 “Tara” operating system series this summer, the Cinnamon 3.8 desktop environment is now available for download and it’s a major release that brings numerous improvements, new features, and lots of Python 3 ports for a bunch of components.
Among the components that got ported to Python 3 in the Cinnamon 3.8 release, we can mention cinnamon-settings, cinnamon-menu-editor, cinnamon-desktop-editor, cinnamon-settings-users, melange, background slideshow, the switch editor and screensaver lock dialogs, desktop file generation scripts, as well as all the utilities.