Jonathan Corbet on Linux Kernel Contributions, Community, and Core Needs

By

-

April 26, 2018

At the recent Embedded Linux Conference + OpenIoT Summit, I sat down with Jonathan Corbet, the founder and editor-in-chief of LWN to discuss a wide range of topics, including the annual Linux kernel report.

The annual Linux Kernel Development Report, released by The Linux Foundation is the evolution of work Corbet and Greg Kroah-Hartman had been doing independently for years. The goal of the report is to document various facets of kernel development, such as who is doing the work, what is the pace of the work, and which companies are supporting the work.

Linux kernel contributors

To learn more about the companies supporting Linux kernel development in particular, Corbet wrote a set of scripts with the release of kernel 2.6.20, to pull the information out of the kernel repository. That information helped Corbet associate contributions with employers, whenever possible.

Read more and watch the video of Corbet’s presentation at The Linux Foundation

RTFM? How to Write a Manual Worth Reading

By

OpenSource.com

-

April 26, 2018

There’s common wisdom in the open source world: Everybody knows that the documentation is awful, that nobody wants to write it, and that this is just the way things are. But the truth is that there are lots of people who want to write the docs. We just make it too hard for them to participate. So they write articles on Stack Overflow, on their blogs, and on third-party forums. Although this can be good, it’s also a great way for worst-practice solutions to bloom and gain momentum. Embracing these people and making them part of the official documentation effort for your project has many advantages.

Unlike writing fiction, where the prevailing advice is just start writing, when it comes to technical writing, you need to plan a bit. Before you start, there are several questions you should ask.

Who?

The first of these is who?. Who are you writing to? Some professional tech writers create personas so that when they are writing, they can think to themselves, “What would Monica need to know in this situation?” or “What kind of problem is Marcus likely to have around this topic?” and then write accordingly.

Recommendations for High-Performance Computing on OpenStack

By

OpenStack

-

April 26, 2018

Over the last year, I’ve been working on use cases with high-performance computing (HPC) on OpenStack.

In this post, I’ll offer some considerations about hosting high performance and high-throughput workloads.

First, let’s start with the three types of architectures that can be used when hosting HPC workloads on OpenStack:

Virtualized HPC on OpenStack
In this architecture, all components of the HPC cluster are virtualized in OpenStack
Bare-metal HPC on OpenStack
All components of the HPC cluster are deployed in bare metal servers using OpenStack Ironic
Virtualized head node and bare-metal compute nodes
The head node (scheduler, master and login node) are virtualized in OpenStack and the compute nodes are deployed in bare metal servers using OpenStack Ironic

Now that you have an overview of the three types of architecture that can deploy HPC software in OpenStack, I’m going to discuss a few OpenStack best practices when hosting these types of workloads.

Cinnamon 3.8 Desktop Environment Released with Python 3 Support, Improvements

By

Softpedia

-

April 26, 2018

While not yet officially announced, the Cinnamon 3.8 desktop environment has been released and it’s already available in the repositories of some popular GNU/Linux distributions, such as Arch Linux.

Scheduled to ship with the upcoming Linux Mint 19 “Tara” operating system series this summer, the Cinnamon 3.8 desktop environment is now available for download and it’s a major release that brings numerous improvements, new features, and lots of Python 3 ports for a bunch of components.

Among the components that got ported to Python 3 in the Cinnamon 3.8 release, we can mention cinnamon-settings, cinnamon-menu-editor, cinnamon-desktop-editor, cinnamon-settings-users, melange, background slideshow, the switch editor and screensaver lock dialogs, desktop file generation scripts, as well as all the utilities.

Episode 7: The Exact Opposite of a Job Creator

By

Screaming in the Cloud

-

April 26, 2018

Monitoring in the entire technical world is terrible and continues to be a giant, confusing mess. How do you monitor? Are you monitoring things the wrong way? Why not hire a monitoring consultant!

Today, we’re talking to monitoring consultant Mike Julian, who is the editor of the Monitoring Weekly newsletter and author of O’Reilly’s Practical Monitoring. He is the voice of monitoring.

Some of the highlights of the show include:

Observability comes from control theory and monitoring is for what we can anticipate
Industry’s lack of interest and focus on monitoring
When there’s an outage, why doesn’t monitoring catch it?” Unforeseen things.
Cost and failure of running tools and systems that are obtuse to monitor
Outsource monitoring instead of devoting time, energy, and personnel to it
Outsourcing infrastructure means you give up some control; how you monitor and manage systems changes when on the Cloud

Read more / listen to the episode at Screaming In The Cloud

Developers: Prepare Your Drivers for Real-Time Linux

By

Eric Brown

-

April 25, 2018

Although Real-Time Linux (RT Linux) has been a staple at Embedded Linux Conferences for years — here’s a story on the RT presentations in 2007 — many developers have viewed the technology to be peripheral to their own embedded projects. Yet as RT, enabled via the PREEMPT_RT patch, prepares to be fully integrated into the mainline kernel, a wider circle of developers should pay attention. In particular, Linux device driver authors will need to ensure that their drivers play nice with RT-enabled kernels.

Julia Cartwright speaking at Embedded Linux Conference.

At the recent Embedded Linux Conference in Portland, National Instruments software engineer Julia Cartwright, an acting maintainer on a stable release of the RT patch, gave a well-attended presentation called “What Every Driver Developer Should Know about RT.” Cartwright started with an overview of RT, which helps provide guarantees for user task execution for embedded applications that require a high level of determinism. She then described the classes of driver-related problems that can have a detrimental impact to RT, as well as potential resolutions.

One of the challenges of any real-time operating system is that most target applications have two types of tasks: those with real-time requirements and latency sensitivity, and those for non-time critical tasks such as disk monitoring, throughput, or I/O. “The two classes of tasks need to run together and maybe communicate with one another with mixed criticality,” explained Cartwright. “You must resolve two different degrees of time sensitivity.”

One solution is to split the tasks by using two different hardware platforms. “You could have an Arm Cortex-R, FPGA, or PLD based board for super time-critical stuff, and then a Cortex-A series board with Linux,” said Cartwright. “This offers the best isolation, but it raises the per unit costs, and it’s hard to communicate between the domains.”

Another approach is to use the virtualization approach provided by Xenomai, the other major Linux-based solution aside from PREEMPT_RT. “Xenomai follows a hypervisor, co-kernel approach using a hypervisor or AMP solution to separate an RTOS from Linux,” said Cartwright. “However, there’s still a tradeoff in limited communications between the two systems.”

RT Linux’s PREEMPT_RT, meanwhile, enables the two systems to share a kernel, scheduler, device stack, and subsystems. In addition, Linux IPC mechanisms can be used to communicate between the two. “There’s not as much isolation, but much greater communication and usability,” said Cartwright.

One challenge with RT is that “because the drivers are shared between the real-time and non real-time systems, they can misbehave,” said Cartwright. “A lot of the bugs we’re finding in RT come from device drivers.”

In RT, the time between when an event occurs – such as a timer firing an interrupt or an I/O device requesting service – and the time when the real-time task executes is called the delta. “RT systems try to characterize and bound this in some meaningful way,” explained Cartwright. “A cylic test takes a time stamp, then sleeps for a set time such as 10ms, and then takes a time stamp when the thread wakes up. The difference between the time stamps, which is the amount of time the thread slept, is called the delta.”

This delta can be broken down into two phases. The first is the irq_dispatch, which is the time it takes between the hardware firing and the dispatch occurring up until the time the thread scheduler is instructed that the thread needs to run. The second phase is scheduling latency, the time between when the scheduler has been made aware that a high priority task needs to run to the moment when the CPU is given the task to execute.

irq_dispatch latency

When using mainline Linux without RT extensions, irq_dispatch latency can be considerable. “Say you have one thread executing in user mode, and an external interrupt such as a network event fires that you don’t care about in your real time app,” said Cartwright. “But the CPU is going to vector off to into hard interrupt context and start executing the handler associated with that network device. If during that interrupt handler duration, a high priority event fires, it’s not able to be scheduled on the CPU until the low priority interrupt is done executing.”

The delta between the internal event firing and the external event “is a direct contributor to irq_dispatch latency,” said Cartwright. “Without an RT patch it would be a mess to define bounds on this because the bound would be the bound of the longest running interrupt handler in the system.”

RT avoids this latency by forcing irq threads. “There’s very little code that we execute in a hard interrupt context – just little shims that wake up the threads that are going to execute your handler,” said Cartwright. “You may have a low priority task running, and perhaps also a medium priority task that the irq fires, but only a small portion of time is spent waking up the associated handlers for the threads.”

RT also provides other guarantees. For example, because interrupt handlers are now running in a thread, it can be preempted. “If a high priority, real-time critical interrupt fires, that thread can be scheduled immediately, which reduces the irq_dispatch latency,” said Cartwright.

Cartwright said that most drivers require no modification to participate in forced irq threading. In fact, “Thread irq actually exists in mainline now. You can boot a kernel and pass the thread irq parameter and it will thread all your interrupts. RT will add a forced enablement.”

Yet there are a few cases when this causes problems. “If your drivers are invoked in the process of delivering an interrupt dispatch, you can’t be threaded,” said Cartwright. “This can happen with irqchip implementations, which should not be threaded. Another issue may arise if the driver is invoked by the scheduler.”

Other glitches can emerge when “explicitly disabling interrupts using local_irq_disable or local_irq_save. Cartwright recommended against using such commands in drivers. As an alternative to local_irq_disable, Cartwright suggested using spinlocks or local locks, a new feature that will soon be proposed for mainline. In a separate presentation at ELC 2018, called Maintaining a Real Time Stable Kernel, Linux kernel developer Steven Rostedt goes into greater depth on local locks.

Cartwright finished up her discussion of irq_dispatch latency issues by discussing some rare, hardware related MMIO issues that can occur. Once when she accidentally pulled the Ethernet cable during testing, it caused buffering in the interconnect, which screwed up the interrupts and was a pain to fix. “To solve it we had to follow each write by a readback, which prevents write stacking,” she said. The ultimate solution? “Take the drugs away from the hardware people.”

Scheduling latency

There appear to be fewer driver problems related to the second latency phase – scheduling – which is the time from when you execute a real-time thread to the time when the irq thread is scheduled. One example stems from the use of preempt_disable, when prevents a higher priority thread to be scheduled upon return from interrupt because preemption has been disabled.

“The only reason for a device driver to use preempt_disable is if you need to synchronize with the act of scheduling itself, which can happen with cpufreq and cpuidle,” said Cartwright. “Use local locks instead.”

On mainline Linux, spinlock-protected critical sections are implicitly executed with preemption disabled, which can similarly lead to latency problems. “With RT we solve this by making spinlock preemptible in critical sections,” said Cartwright. “We turn them into pi-aware mutexes and disable migration. When a spinlock is held by a thread it can be preempted by a higher priority thread to bring in the outer bound.”

Most drivers require no changes in order to have their spinlock critical sections preemptible, said Cartwright. However, if a driver is involved in interrupt dispatch or scheduling, they must use raw_spin_lock(), and all critical sections must be made minimal and bounded.

You can watch the complete presentation below:

Join us at Open Source Summit + Embedded Linux Conference Europe in Edinburgh, UK on October 22-24, 2018, for 100+ sessions on Linux, Cloud, Containers, AI, Community, and more.

Extending the Kubernetes Cluster API

By

Henrik Schmidt

-

April 25, 2018

One major downside for developers is that Kubernetes has no native functionality to manage its nodes or other clusters. As a consequence, operations must get involved every time a new cluster or worker node is needed. Of course, several ways exist to create a cluster and also to create a new worker node. But all of them require a specific domain knowledge.

Cluster API

The Cluster API is a new working group under the umbrella of the sig-cluster-lifecycle. The objective of the group is to design an API which enables you to create clusters and machines via a simple, declarative API. The working group is in the very early stages of defining all API types. But an example for GCP already exists.

The whole Cluster API currently consists of four components: one API server and three controllers.

API server

The Cluster API in its current state brings an extension-API server which is responsible for CRUD operations on the API resources.

Controllers

Currently there are three controllers planned:

MachineController
MachineSet
MachineDeployment

The MachineController is meant to be provider specific as each provider has its own way of managing machines. MachineSet and MachineDeployment on the other hand will be generic controllers which simply generate Machine or, respectively, MachineSet resources. This means that this approach is very similar to a Deployment which manages ReplicaSets and a ReplicaSet manages Pods.

But, of course, a provider could also implement a MachineSet and MachineDeployment controller, which could make sense, looking at AutoScaling groups on AWS or NodePools on GKE.

API types

The Cluster API introduces four new types:

Cluster

The cluster represents a Kubernetes cluster with configuration of the entire control plane except for node specifics.

BjPy8dXKy_zjconTBQFPZ44uSCG5_0UONt8LV4Mq

Machine

A machine represents an instance at a provider, which can be any kind of server, like an AWS EC2 instance, a PXE booted bare-metal server or a Raspberry PI.

iZ3EPJ7-ehme_Nd0y73lp70z_V-EtlgKZE5XxuG0

MachineSet – You see where this is going 😉

A MachineSet is similar to a ReplicaSet: a definition for a set of same machines. The MachineSet controller will create machines based on the defined replicas and the machine template.

_BE-XbWU8HLbdTiGM7XUhYjoUlS-CuC-AhgTl9AI

A MachineDeployment is similar to a Deployment: a definition for a well managed set of machines. The MachineDeployment controller though, will not directly manage machines but MachineSets. For each change on a MachineDeployment, the controller will create and scale up a new MachineSet to replace the old one.

MachineDeployment

BpPlGcVeqLWKQlrKeMAY1NWaOVm1AKpryMFzAc0d

ProviderConfig

Each machine and cluster type has a field called providerConfig within its spec. The field providerConfig is loosely defined. It allows arbitrary data to be stored in it. It allows provider-specific configuration for each API implementation.

Outlook

Possible ways to utilize this new API would be:

Autoscaling
Integration with the Cluster Registry API
- Automatically add a new cluster to a registry, support tooling that works across multiple clusters using a registry, delete a cluster from a registry
Streamlining Kubernetes installers by implementing the Cluster API
Declarative Kubernetes upgrades for the control plane and kubelets
Maintaining consistency of control plane and machine configuration across different clusters / clouds
Cloud adoption / lift and shift / liberation

Henrik Schmidt is a Senior Developer at Loodse. He is passionate about the potential of Kubernetes and cloud native technologies and has been a major contributor to the Open Source projects nodeset and kube-machine.

Henrik’s colleague Guus van Weelden will be speaking on “Let’s Play with Lego and Kubernetes” and his colleague Matthias Loibl will be speaking on “Declarative Multi-Cluster Monitoring with Prometheus” at KubeCon + CloudNativeCon EU, May 2-4, 2018 in Copenhagen, Denmark.

The Role of Site Reliability Engineering in Microservices

By

The New Stack

-

April 25, 2018

While SREs are hotshots in the industry, their role in a microservices environment is not just a natural fit that goes hand-in-hand, like peanut butter and jelly. Instead, while SREs and microservices evolved in parallel inside the world’s software companies, the former actually makes life far more difficult for the latter.

That’s because SREs live and die by their full stack view of the entire system they are maintaining and optimizing. The role combines the skills of a developer with those of an admin, producing an employee capable of debugging applications in production environments when things go completely sideways.

As Google engineers essentially invented the role, the company offers a great deal of insight into how they manage systems that handle up to 100 billion requests a day. They boil down reliability into an essential element, every bit as desirable as velocity and innovation.

“The initial step is taking seriously that reliability and manageability are important. People I talk to are spending a lot of time thinking about features and velocity, but they don’t spend time thinking about reliability as a feature,” said Todd Underwood, an SRE director at Google.

Automotive Linux Summit & OS Summit Japan Schedule Announced

By

The Linux Foundation

-

April 25, 2018

Attend Automotive Linux Summit and Open Source Summit Japan in Tokyo, June 20 – 22, for three days of open source education and collaboration.

Automotive Linux Summit connects those driving innovation in automotive Linux from the developer community, with the vendors and users providing and using the code, in order to propel the future of embedded devices in the automotive arena.

Session highlights for Automotive Linux Summit:

Enabling Hardware Configuration Flexibility Keeping a Unified Software – Dominig ar Foll, Intel
Beyond the AGL Virtualization Architecture – AGL Virtualization Expert Group (EG-VIRT) – Michele Paolino, Virtual Open Systems
High-level API for Smartphone Connectivity on AGL – Takeshi Kanemoto, RealVNC Ltd.
AGL Development Tools – What’s New in FF? – Stephane Desneux, IoT.bzh

Suspicious Event Hijacks Amazon Traffic for 2 Hours, Steals Cryptocurrency

By

ArsTechnica

-

April 25, 2018

Amazon lost control of a small number of its cloud services IP addresses for two hours on Tuesday morning when hackers exploited a known Internet-protocol weakness that let them to redirect traffic to rogue destinations. By subverting Amazon’s domain-resolution service, the attackers masqueraded as cryptocurrency website MyEtherWallet.com and stole about $150,000 in digital coins from unwitting end users. They may have targeted other Amazon customers as well.

The incident, which started around 6 AM California time, hijacked roughly 1,300 IP addresses, Oracle-owned Internet Intelligence said on Twitter. … The 1,300 addresses belonged to Route 53, Amazon’s domain name system service.

The highly suspicious event is the latest to involve Border Gateway Protocol, the technical specification that network operators use to exchange large chunks of Internet traffic. Despite its crucial function in directing wholesale amounts of data, BGP still largely relies on the Internet-equivalent of word of mouth from participants who are presumed to be trustworthy. Organizations such as Amazon whose traffic is hijacked currently have no effective technical means to prevent such attacks.