How Facebook Uses Linux and Btrfs: An Interview with Chris Mason

By

-

December 29, 2016

Chris Mason is the principal author of Btrfs, the open source file system that’s seen as the default file system for SUSE Enterprise Linux. Mason started working on Btrfs at Oracle and then moved to Facebook where he continued to work on the file system as a member of the company’s Linux kernel team. When Facebook has new kernels that need to go out, Mason helps make sure that everything’s been properly tested and meets performance needs.

We sat down with Mason to learn more about the status of Btrfs and how Facebook is using Linux and Btrfs. Here is an edited version of that interview.

Linux.com: Btrfs has been in development for a long time. Is it ready for prime time? I know some Linux distributions are using it as the default file system, whereas others don’t.

Chris Mason: It’s certainly the default in SUSE Linux Enterprise Server. SUSE spends a considerable amount of energy and people in supporting Btrfs, which I really appreciate. Red Hat hasn’t picked it up the same way. It’s one of those things where people pick up the features that they care the most about and the ones that they want to build on top of.

Linux.com: What are the areas where Btrfs makes more sense? If I am not wrong, Facebook also uses Btrfs?

Mason: Inside of Facebook, again we pick targeted places where we think the features of Btrfs are really beneficial to the workloads at hand. The big areas we are trying to focus on are system management tasks, the snapshotting type of things.

Linux.com: We all know that Facebook is a heavy user of Linux. Within the massive infrastructure of Facebook, where is Linux being used?

Mason: The easiest way to describe the infrastructure at Facebook is that it’s pretty much all Linux. The places we’re targeting for Btrfs are really management tasks around distributing the operating system, distributing updates quickly using the snapshotting features of Btrfs, using the checksumming features of Btrfs and so on.

We also have a number of machines running Gluster, using both XFS and Btrfs. The target there is primary data storage. One of the reasons why they like Btrfs for the Gluster use case is because the data CRCs (cyclic redundancy checks) and the metadata CRCs give us the ability to detect problems in the hardware such as silent data corruption in the hardware. We have actually found a few major hardware bugs with Btrfs so it’s been very beneficial to Btrfs.

Linux.com: While we are talking about Linux at Facebook, I am curious how close or far you are from the mainline as no one is using the stock kernel; everyone creates a minor fork with tweaks and tuning for use case.

Mason: From a Linux point of view, our primary goal with the Linux kernel is to track main line as much as we can. Our goal is to update the kernel at least once a year. We’re trying to move to a more frequent update cycle than that. We have an upstream first policy where we get the changes in the mainline before we use it. If we want to have a feature in the kernel, it has to go to mainline first.

Linux.com: Why do you need your own fork?

Mason: It’s impossible to run mainline kernel. You have to have some kind of fork, you fine-tune things, you tweak things, and you apply some patches for your own use cases. Our goal is to keep that fork as small as humanly possible. When we were moving from the 4.0 kernel to the 4.6 kernel, which we’re still in the process of moving to, I was really happy when we were able to get a production workload performance on par with just one patch. That was a really big deal. Being able to take basically a vanilla 4.6 kernel and have the same performances we had on our patched 4.0 kernel. And, that’s really our long-term goal: to get closer and closer to just being able to run mainline so that we can do the transition from one kernel to another very quickly.

Linux.com: We have all seen machines running really old Linux kernels, whereas you are aiming to run the latest one if you can. What’s the advantage?

Mason: The biggest benefit, as an engineering organization, is that we want to hire people who are doing upstream things. Developers want to work on new and innovative technologies, they want to do their work upstream, they want to come to these conferences, and they want to be a part of the community. We want to be able to get our work into the upstream kernel and then bring that back to Facebook. It’s easier to find and hire upstream developers, and it’s the best way to keep the maintenance workload down.

Linux.com: In the server space, we often hear from sysadmins that “once it’s installed and running don’t touch it,” which is contrary to what we see in modern IT infrastructure where the mantra seems to be move faster to stay secure.

Mason: I think that the scale of Facebook makes it easier for us to test things. It’s not that the testing work itself is easier, but we can spread that work over a large number of machines.We have the ability to take the testing work to what we call “Shadow Tiers.” On those Shadow Tiers, we can replay production traffic in a non-production environment so we can be in a very safe place to check performance and ensure stability. We can ramp that traffic up so I can start and say, “Okay, I’ll give it 5 percent of a replay of the production traffic and go all the way up to 100 and watch the performance current as I go.” I can get a very strong A/B comparison between two kernels along the way.

We have the tools to validate the kernels and to help test the upstream kernels. It’s easier to fix new and interesting bugs in upstream than it is to constantly just find old bugs that upstream has already fixed.

Linux.com: What are the things that keep you worried?

Mason: In terms of running the Linux Kernel or file systems, we test so well and there’s so much community support around Linux that I don’t really worry about running that.

Linux.com: You have been involved with Linux for a very long time and Linux just celebrated its 25th anniversary, what do you think Linux has achieved in these 25 years?

Mason: The part that I give Linus the most credit for, aside from the technical contributions which are obvious, is his ability to create the kernel community of developers where people were so actively interested in moving forward from version to version. Linux didn’t fragment the way so many other projects have. It’s not all Linus, but I give Linus so much credit because with the processes that he set up, it was much easier to move forward with the kernel than it was to fork it and do something different.

I think that’s an important contribution that a lot of people overlook in terms of how the kernel community has stuck together and brought in new companies instead of pushing them away.

Get started with Linux development. Check out the “Introduction to Linux, Open Source Development, and GIT” course from The Linux Foundation.

Tools and Processes for Monitoring Containers

By

The New Stack

-

December 29, 2016

With the introduction of containers and microservices, monitoring solutions have to handle more ephemeral services and server instances than ever before. And while the infrastructure landscape has changed, operations teams still need to monitor the same information on the central processing unit (CPU), random access memory (RAM), hard disk drive (HDD), network utilization, and the availability of application endpoints.

While you can use an older tool or existing monitoring service for traditional infrastructures, there are newer cloud-based offerings that can ensure monitoring solutions are as scalable as the services being built and monitored. Many of these cloud-based and self-hosted tools are purpose-built for containers. No matter what solution or service you use, you still need to know how you’re going to collect the metrics you’re looking to monitor.

2016 SDN Trends: The Year of the Software-Defined WAN

By

Tech Target

-

December 29, 2016

In 2015, the networking world was abuzz about software-defined WAN and its potential. The buzz remained — and perhaps intensified — throughout 2016, as more enterprises deployed SD-WAN technology and the potential became reality.

It comes as little surprise, then, that SD-WAN was a consistent and popular topic in SearchSDN’s news and commentary throughout the year. Other percolating software-defined networking (SDN) trends were open source SDN, Cisco ACI versus VMware NSX, DevOps and training for the software-defined future. Here’s a glimpse into some of the SDN trends we covered throughout 2016.

How to Use Fail2Ban to Blunt Brute-force Attacks

By

Linode

-

December 29, 2016

With Fail2Ban you can automatically help your firewall protect your server.

WordFence, the WordPress security plugin company, tells me that unsophisticated brute-force attacks have doubled in the past three weeks. While WordFence can help keep your WordPress instances up and running, your server is still getting mauled. What can you do about it? You can use Fail2Ban to patch your firewall against blunt attackers in real time.

It’s a shame that many of you haven’t heard of, never mind use, Fail2Ban. I’ve found it to be a very useful and easy way to protect servers that is just as easy to install and deploy.

A Chip to Protect the Internet of Things

By

IEEE Spectrum

-

December 29, 2016

The Internet of Things offers the promise of all sorts of nifty gadgets, but each connected device is also a tempting target for hackers. As recent cybersecurity incidents have shown, IoT devices can be harnessed to wreak havoc or compromise the privacy of their owners. So Microchip Technology and Amazon.com have collaborated to create an add-on chip that’s designed to make it easier to combat certain types of attack—and, of course, encourage developers to use Amazon’s cloud-based infrastructure for the Internet of Things.

Google Open-Sources Test Suite to Find Crypto Bugs

By

InfoWorld

-

December 29, 2016

Working with cryptographic libraries is hard, and a single implementation mistake can result in serious security problems. To help developers check their code for implementation errors and find weaknesses in cryptographic software libraries, Google has released a test suite as part of Project Wycheproof.

“In cryptography, subtle mistakes can have catastrophic consequences, and mistakes in open source cryptographic software libraries repeat too often and remain undiscovered for too long,” Google security engineers Daniel Bleichenbacher and Thai Duong, wrote in a post announcing the project on the Google Security blog.

2017’s Big Question: Who Pays for the Blockchain?

By

Coin Desk

-

December 29, 2016

2016 saw the rise of the blockchain evangelist. Not since the heady dotcom days have we seen so many experts hyping a new technology. But, amid the hype, little attention has been paid to an important question. Who pays for the blockchain?

This consideration is especially important to anyone evaluating blockchain technology for their organization.

The blockchain buzz began in 2015. Bitcoin’s association with illegal activities earned it a bad reputation. This led startups to brand themselves as blockchain companies. They promised to deliver the benefits of the “technology behind bitcoin” without the undesirable baggage. Most didn’t understand that the technology behind bitcoin has existed for years.

Bitcoin’s success is a result of the network’s economic incentives.

Kubernetes: A True Cloud Platform

By

Ian Murphy

-

December 28, 2016

The Kubernetes community is building a platform that will make application development completely cloud infrastructure agnostic. Sam Ghods, co-founder of Box, said Kubernetes’ combination of portability and extensibility put it in a class of its own for cloud application development, during his CloudNativeCon keynote in November.

“We finally have a portable abstraction to work against in cloud infrastructure,” he said.

Ghods compared Kubernetes to other platforms like Linux, which provides consistency across almost any hardware, Java, which runs on almost any operating system, and Twilio, which provides a single platform across dozens of complicated telephony services. The whole idea is to get the messy bits in the background and create a consistent and predictable layer for creation.

“A platform abstracts away a messy problem so you can build on top of it,” Ghods said.

Currently, each of the major cloud infrastructure providers — like Amazon Web Services, Google, Microsoft Azure, and OpenStack — offers different solutions for autoscaling, load balancing, and remote storage, and no solution at all for service discovery.

As a platform, Kubernetes rises above the mess and provides a single layer where developers can be certain that the specifications needed to run the application they’re creating will always exist. That way, no attention is required to deal with how each infrastructure will fulfill the application’s requirements.

“Now, I can write one JSON spec and submit that to any Kubernetes cluster running anywhere and be able to recreate exactly the right topology and exactly the right infrastructure that I need,” Ghods said.

That’s the portability piece of the platform, Ghods said. For extensibility — which Ghods said was the piece he was the most personally excited about — the community is constantly releasing new features and projects to set Kubernetes apart. One such feature is Dashboard, a UI to show resource utilization. Another key component under development is cluster federation for load balancing, and the etcd operator recently introduced by CoreOS, ensures the application is running in the desired state on the cluster.

Ghods said he’s been implementing Kubernetes at Box for the last two years, and the inclusivity and transparency of the community are what set the project apart from other attempts at creating a stable cloud application platform.

“Kubernetes has the opportunity to be the new cloud platform,” Ghods said. “I think the tooling we’re seeing is just the tip of the iceberg. I think the amount of innovation and leverage that’s going to come from being able to standardize on Kubernetes as a platform is incredibly exciting, more exciting than anything I’ve seen in the last 10 years of working on the cloud.

“We have an opportunity here in this room to do what AWS did for infrastructure, but this time in an open, universal, community-driven way,” Ghods sadi. “We can build tooling that people today only dreamt of having and truly uplevel the next generation of developers.”

Do you need training to prepare for the upcoming Kubernetes certification? Pre-enroll today to save 50% on Kubernetes Fundamentals (LFS258), a self-paced, online training course from The Linux Foundation. Learn More >>

Saving Application State in the Stateless Container World

By

Carla Schroder

-

December 28, 2016

Running applications in our brave new container orchestration world is like managing herds of fireflies; they blink in and out. There is no such thing as uptimes anymore. Applications run, and when they fail, replacements launch from vanilla images. Easy come, easy go. But if your application needs to preserve state, it and must either take periodic snapshots or have some other method of recovering state. Snapshots are far from ideal as you will likely lose data, as with any non-graceful shutdown. This is not optimal, so Apache Mesophere’s Isabel Jimenez and Kapil Arya presented some new ideas at LinuxCon North America.

Arya explains how managing stateless applications is different from managing stateful applications: “When you scale up, you basically launch the new instances, or new loads, or a new cluster. They are pretty much starting all from the vanilla image, the idea being that everything is immutable. When you want to scale down, you just kill the extra instances. If the need comes and you want to, say, schedule some high-priority task, you can easily kill the additional instances that are no longer needed or that need to be preempted, and your high-priority task can actually get the node or the resources right away.”

Stateful applications are different. “To kill an application that is already running, if it’s not a graceful shutdown, then you lose the computation time, and so on. Basically, what that means is, if you have a high-priority task coming in, then killing some instances of the stateful application will definitely result in some compute time loss.”

Container orchestration tools are more optimized for stateless applications. How can we make it better for stateful applications? Arya says, “Make them stateless.” How? One way is to start from scratch. Rewrite your stateful apps to be stateless. That is probably not going to happen. Instead, you could offload the job of managing state to your container orchestration framework and migrate your processes. “We’ll see what actually is involved in doing such a migration. This is a very general recipe that pretty much works on all these scenarios. You first pause the running process, or the container, or the virtual machine, so that the state is now immutable. You then take a snapshot of the current state. You copy over the snapshot to the target node, or the new data center, or the new cluster. Finally, you restart from that snapshot that you just took, and you have the application or the virtual machine up and running.”

Taking the snapshot is referred to as checkpointing. Ideally this happens very quickly, in milliseconds, so that nobody notices any interruptions or delays. Several factors influence this, especially the memory footprint of the application. Arya says that “If you have a memory footprint of a gigabyte, and you’re writing a checkpoint image to a regular disk, then assuming there’s roughly 100 megabytes per second, it’ll take 10 seconds to dump the checkpoint image. If you have some fancy hardware back end, like Cluster File System, then you can get pretty amazing speeds like 60 gigabytes per second or so.”

Watch the complete presentation (below) to learn more details of how Apache is building this functionality into Mesos and to see it demonstrated.

Testing Distributed Systems in Go

By

Gopher Academy

-

December 28, 2016

What is etcd

etcd is a key-value store for the most critical data of distributed systems. Use cases include applications running on Container Linux by CoreOS, which enables automatic Linux kernel updates. CoreOS uses etcd to store semaphore values to make sure only subset of cluster are rebooting at any given time. Kubernetes uses etcd to store cluster states for service discovery and cluster management, and it uses watch API to monitor critical configuration changes. Consistency is the key to ensure that services correctly schedule and operate.

Reliability and robustness is etcd’s highest priority. This post will explain how etcd is tested under various failure conditions.