Taming the Chaos of Modern Caches

By

-

July 6, 2016

“If you’re a bit tired, this is a presentation on cache maintenance, so there will be plenty of opportunity to sleep.” Despite this warning from ARM Ltd. kernel developer Mark Rutland at his recent Embedded Linux Conference presentation, Stale Data, or How We (Mis-)manage Modern Caches, it was actually kind of an eye opener — at least as far as cache management presentations go.

For one thing, much of what you think you know about the subject is probably wrong. It turns out that software — and computer education curricula — have not always kept up with new developments in hardware. “Cache behavior is surprisingly complex, and caches behave in subtly different ways across SoCs,” Rutland told the ELC audience. “It’s very easy to misunderstand the rules of how caches work and be lulled into a false sense of security.”

SoC Tricks

Even within a single chip architecture, every system-on-chip (SoC) is integrated slightly differently. Modern SoCs perform a number of tricks, such as speculation, to offer better power/performance, and it is easy for the unwary developer to be surprised by their side effects on caches.

“By hitting the cache, you can avoid putting traffic on the memory interconnect, which can be clocked at a lower speed,” said Rutland. “The CPU can do more work more quickly and go to sleep more quickly. Modern CPUs do fewer write-backs to memory and try to keep data in caches for as long as possible. They allow multiple copies of memory locations to exist.”

Almost every CPU does some automatic prefetching and buffering of stores, and many also do out of order execution and some level of speculation. As a result, “your code might not match the reality of what the CPU happens to be doing,” said Rutland. “The CPU might speculate something completely erroneously, and start preloading data into caches, and it might turn out that those accesses never existed in your code. It’s incredibly nondeterministic.”

Not only is this behavior “really difficult” to predict, but “over time the CPU gets more aggressive, so it will be even more difficult,” added Rutland. Other trends that add to the complexity are the growth in multi-core SMP systems and complex configurations such as big.LITTLE, in which different CPU implementations can exist within a single system. Newer technologies like coherent DMA masters can solve some cache problems but also add more factors to juggle. “We’re also beginning to see things like GPUs accessing memory a lot in weird and varied patterns,” said Rutland.

In this chaotic environment, it can be difficult to determine cache coherence, which Rutland defines as “two accesses appearing to use the same copy, rather than the caches themselves having the same property of data.” However, as Rutland noted, all this hardware follows a common set of rules when it comes to the behavior of caches, which can be simpler to think about.

When following these rules, “you have to be a lot more stringent in your cache management to make sure you get what you expect,” said Rutland. “With all these complex cache coherence protocols, misuse or inconsistent use can lead to a long-term loss of coherence. It’s incredibly important to reason about the behavior of the caches in the background and understand what cache maintenance primitives are available.”

Explaining the Mystery of Caches

Rutland then set out to explain the mystery of caches, at least on modern multi-core ARMv8 SoCs. Most of the guidelines, which combine ARM Reference Manual rules and street wisdom about CPU caching behavior, are also applicable to ARMv7.

Rutland started by discussing the options for the cache-ability for the normal memory location: non-cacheable, write-through, and write-back. Write-through, which is typically used in frame buffers, “means that when you write to a location, both memory and caches are updated at the same time, but reads might only look in caches,” said Rutland. Operating systems typically use write-back, where “you don’t really care if the memory is up to date.” These techniques are controlled separately for inner caches near a CPU cluster vs. outer caches.

Rutland went on to cover shareability domains, which include non-, inner-, outer-, and system-shareability. Single OSes or hypervisors typically use inner-shareability whereas outer-shareability domains allow for multiple independent OSes running on a large, complex, multi-core system.

Then there are cache states to keep in mind, as well as cache coherence protocols such as MSI. “It’s very difficult to reason about the precise state of caches because they may have coherence protocol specific data associated with cache entries,” said Rutland.

Cache states can be grouped into invalid, clean, and dirty. As you might expect, the most challenging is the dirty state. “Caches can write back dirty lines at any time for any reason, such as making space for things that they erroneously speculated,” said Rutland.

Surprisingly to many, caches are never fully turned off in ARM systems. “Even when the MMU is off or the CPU isn’t making cacheable accesses, data can still sit in the cache or dirty data can be written back at any arbitrary point in time,” said Rutland.

Cache Maintenance

Rutland went on to discuss different types of cache maintenance operations, including clean – a sort of double-check on the clean cache state — and an invalidate operation that deletes data from the cache. There’s also a clean+invalidate operation that combines the two. In ARM, there is no such thing, however, as “flushing,” he added, noting the term was ambiguous. “People should stop talking about flushing the caches. It means absolutely nothing.”

Rutland also warned against using the popular Set/Way instructions for defined power up/down cache management unless one is intimately aware of how that particular CPU behaves with Set/Way. Long story short: “Misuse of Set/Way can result in a complete loss of coherence…and cause horrible problems,” said Rutland. “Instead you should use VA [virtual address] cache maintenance, which gets a set of memory attributes from the MMU, including the share-ability domain of that VA.”

After answering questions from the audience, Rutland concluded with: “Thank you for staying awake.” You’re welcome, but it’s tough sleeping when hearing about the horrors of modern cache maintenance.

Watch the complete presentation below:

https://www.youtube.com/watch?v=F0SlIMHRnLk

New Toolset Makes it Possible to Build and Ship Docker Containers Within Ansible

By

InfoWorld

-

July 6, 2016

Using Ansible playbooks, instead of Docker’s tools, opens the doors to new kinds of dev automation. A new project from the creators of the system automation framework Ansible, now owned by Red Hat, wants to make it possible to build Docker images and perform container orchestration within Ansible.

Ansible Container, still in the early stages of development, allows developers to use an Ansible playbook (the language that describes Ansible jobs) to outline how containers should be built; it uses Ansible’s stack to deploy those applications as well.

Virtualization at Scale: How to Operationalize and Commercialize

By

Telecom Engine

-

July 6, 2016

One of the major causes for lack of full scale operationalization is the operators’ shortcoming with their existing network management and siloed operations support systems (OSS), which limits their ability to effectively fulfill and assure services in a hybrid environment. In many instances operators have taken a very myopic bottom-up approach where they have deployed solutions solely to manage the VNF components, which does nothing but add complexity to an already complex hybrid physical and virtual network environment. In addition to that lack of alignment between service fulfillment and service assurance, operators’ networks may also lack integrity between service configuration and device configuration, automatic discovery and reconciliation capability, real-time policy driven service management, evolution of centralized catalog to manage and blend both virtualized and non-virtualized services, and multi-party compensation and revenue management capability, all of which can hinder commercialization of SDN and NFV.

The Small Batches Principle

By

ACM Queue

-

July 6, 2016

Reducing waste, encouraging experimentation, and making everyone happy

Q: What do DevOps people mean when they talk about small batches?

A: To answer that, let’s take a look at an unpublished chapter from the upcoming book The Practice of System and Network Administration, third edition, due out in October 2016.

One of the themes you will see in this book is the small batches principle: it is better to do work in small batches than big leaps. Small batches permit us to deliver results faster, with higher quality and less stress.

We begin with an example that has nothing to do with system administration in order to demonstrate the general idea. Then we focus on three IT-specific examples to show how the method applies and the benefits that follow.

The small batches principle is part of the DevOps methodology. It comes from the lean manufacturing movement, which is often called just-in-time manufacturing. It can be applied to just about any kind of process. It also enables the MVP (minimum viable product) methodology, which involves launching a small version of a service to get early feedback that informs the decisions made later in the project.

Data Center SDN: Comparing VMware NSX, Cisco ACI, and Open SDN Options

By

Data Center Knowledge

-

July 6, 2016

The data center network layer is the engine that manages some of the most important business data points you have. Applications, users, specific services, and even entire business segments are all tied to network capabilities and delivery architectures. And with all the growth around cloud, virtualization, and the digital workspace, the network layer has become even more imporant.

Most of all, we’re seeing more intelligence and integration taking place at the network layer. The biggest evolution in networking includes integration with other services, the integration of cloud, and network virtualization. Let’s pause there and take a brief look at that last concept….

There are several vendors offering a variety of flavors of SDN and network virtualization, so how are they different? Are some more open than others? Here’s a look at some of the key players in this space.

LzLabs Launches Product to Move Mainframe COBOL Code to Linux Cloud

By

TechCrunch

-

July 6, 2016

Somewhere in a world full of advanced technology that we write about regularly here on TechCrunch, there exists an ancient realm where mainframe computers are still running programs written in COBOL.

This is a programming language, mind you, that was developed in the late 1950s, and used widely in the ’60s and ’70s and even into the ’80s, but it’s never really gone away. You might think it would have been mostly eradicated from modern business by now, but you would be wrong.

As we march along, however, the pool of people who actually know how to maintain these COBOL programs grows ever smaller by the year, and companies looking to move the data (and even the archaic programs) to a more modern platform could be stuck without personnel to help guide them through the transition.

Set up SSL Certificates in 5 Minutes Using Let’s Encrypt

By

DevX

-

July 6, 2016

Let’s Encrypt simplifies the process of installing SSL certificates and allows you to set up a free SSL certificate on your Web site in just a few minutes.

How Cloud Computing is Driving Demand for Open Source Talent

By

Yuri Bykov

-

July 5, 2016

There are very few technologies that have had as big an influence on businesses today as cloud computing. It has completely changed the way in which businesses and tech teams think and function. Prior to the adoption of public cloud technologies, businesses relied heavily upon data centers in order to store and process their information. As a tech professional, one’s operational expertise was focused around hardware (i.e. server, storage and network). The rise of cloud computing, however, caused this operational expertise to shift. Today, more companies are defining infrastructure as software. Defining infrastructure as code requires open source professionals to adopt new skillsets to help define and manage software-defined infrastructures.

As an open source professional, familiarity with major cloud vendors such as Amazon Web Services and Microsoft Azure is critical from a hiring manager’s perspective. While these skills are not necessarily “open source,” knowing how vendors define infrastructure is crucial for deploying cloud-based services and supporting services for everything from applications hosting and data storage to content distribution. Therefore, open source professionals will need to supplement their skills with a strong working knowledge of these platforms.

Cloud-related skills are among the fastest growing on Dice.

Vendor and cloud related skills are amongst the fastest growing on Dice, with job postings for professionals with Azure experience, as an example, up 87 percent year-over-year. A keen understanding of smaller virtual private servers (VPS), like Linode or Digital Ocean, are also valuable from a professional development standpoint. Employers often use these servers in conjunction with major cloud vendors as a means to reduce costs and ensure higher levels of service.

To supplement vendor skills, employers are also in the market for open source professionals with experience in configuration management tools like Puppet, Chef, Ansible or SaltStack. Configuration management has become a growing point of entry into the open source community for many companies. All of the major configuration software vendors and sponsors began as open source projects, with most continuing to do a majority of their development in an open source model.

Configuration management can aid companies in building out infrastructure in a fully automated and repeatable fashion. Rather than having to rely upon manual configurations and custom scripting to create and manage infrastructure, tools, like Puppet and Chef, are being used to expedite the deployment process and eliminate human error. On Dice, there are roughly 1,700 Puppet postings and 1,600 Chef job postings on any given day, representing roughly 2 percent each of the more than 87,000 total jobs posted on the site.

The rise of cloud computing has revolutionized the way companies and tech teams operate today. Perhaps, this is why more than half (51 percent) of hiring managers and recruiters found cloud technologies to have the biggest impact on open source hiring in 2016, according to the 2016 Open Source Jobs Report. As an open source professional, expanding one’s knowledge base to include cloud-related skills isn’t just smart, it’s almost a necessity. It also doesn’t hurt that tech professionals who have cloud experience are well compensated. Dice’s latest annual salary survey found cloud (as well as big data) skills represented the majority of 2015’s highest earners, making $131,121 to $142,845 on average. Cloud computing is a mainstay of the tech industry, which seems to continue to weigh heavy on employers’ minds as they look to make open source hiring decisions.

Yuri Bykov manages Data Science at Dice.

The Growth of the Linux and Open Source Channel since 1989

By

The VAR Guy

-

July 5, 2016

The Linux kernel was born twenty-five years ago this summer. Since that time a thriving partner ecosystem has arisen around open source platforms built on Linux, GNU and other free and open source software products. Here’s a look at milestones in the evolution of the Linux channel and partner ecosystem.

Contributing to Apache Mesos: Where to Begin – Joris Van Remoortere & Michael Park, Mesosphere

By

Amber Ankerholz

-

July 5, 2016

https://www.youtube.com/watch?v=SnPmU61fVjQ?list=PLGeM09tlguZQVL7ZsfNMffX9h1rGNVqnC

Contributing to a large complex project involves a fair bit of bureaucracy. There are standards and procedures to follow. The Mesos project provides a lot of help and support for contributors, so watch Van Remoortere and Park’s talk to learn the right way to become a Mesos contributor.