How Facebook Uses Linux and Btrfs: An Interview with Chris Mason


Chris Mason is the principal author of Btrfs, the open source file system that’s seen as the default file system for SUSE Enterprise Linux. Mason started working on Btrfs at Oracle and then moved to Facebook where he continued to work on the file system as a member of the company’s Linux kernel team. When Facebook has new kernels that need to go out, Mason helps make sure that everything’s been properly tested and meets performance needs.

We sat down with Mason to learn more about the status of Btrfs and how Facebook is using Linux and Btrfs. Here is an edited version of that interview. Btrfs has been in development for a long time. Is it ready for prime time? I know some Linux distributions are using it as the default file system, whereas others don’t.

Chris Mason: It’s certainly the default in SUSE Linux Enterprise Server. SUSE spends a considerable amount of energy and people in supporting Btrfs, which I really appreciate. Red Hat hasn’t picked it up the same way. It’s one of those things where people pick up the features that they care the most about and the ones that they want to build on top of. What are the areas where Btrfs makes more sense? If I am not wrong, Facebook also uses Btrfs?

Mason: Inside of Facebook, again we pick targeted places where we think the features of Btrfs are really beneficial to the workloads at hand. The big areas we are trying to focus on are system management tasks, the snapshotting type of things. We all know that Facebook is a heavy user of Linux. Within the massive infrastructure of Facebook, where is Linux being used?

Mason: The easiest way to describe the infrastructure at Facebook is that it’s pretty much all Linux. The places we’re targeting for Btrfs are really management tasks around distributing the operating system, distributing updates quickly using the snapshotting features of Btrfs, using the checksumming features of Btrfs and so on.

We also have a number of machines running Gluster, using both XFS and Btrfs. The target there is primary data storage. One of the reasons why they like Btrfs for the Gluster use case is because the data CRCs (cyclic redundancy checks) and the metadata CRCs give us the ability to detect problems in the hardware such as silent data corruption in the hardware. We have actually found a few major hardware bugs with Btrfs so it’s been very beneficial to Btrfs. While we are talking about Linux at Facebook, I am curious how close or far you are from the mainline as no one is using the stock kernel; everyone creates a minor fork with tweaks and tuning for use case.

Mason: From a Linux point of view, our primary goal with the Linux kernel is to track main line as much as we can. Our goal is to update the kernel at least once a year. We’re trying to move to a more frequent update cycle than that. We have an upstream first policy where we get the changes in the mainline before we use it. If we want to have a feature in the kernel, it has to go to mainline first. Why do you need your own fork?

Mason: It’s impossible to run mainline kernel. You have to have some kind of fork, you fine-tune things, you tweak things, and you apply some patches for your own use cases. Our goal is to keep that fork as small as humanly possible. When we were moving from the 4.0 kernel to the 4.6 kernel, which we’re still in the process of moving to, I was really happy when we were able to get a production workload performance on par with just one patch. That was a really big deal. Being able to take basically a vanilla 4.6 kernel and have the same performances we had on our patched 4.0 kernel. And, that’s really our long-term goal: to get closer and closer to just being able to run mainline so that we can do the transition from one kernel to another very quickly. We have all seen machines running really old Linux kernels, whereas you are aiming to run the latest one if you can. What’s the advantage?

Mason: The biggest benefit, as an engineering organization, is that we want to hire people who are doing upstream things. Developers want to work on new and innovative technologies, they want to do their work upstream, they want to come to these conferences, and they want to be a part of the community. We want to be able to get our work into the upstream kernel and then bring that back to Facebook. It’s easier to find and hire upstream developers, and it’s the best way to keep the maintenance workload down. In the server space, we often hear from sysadmins that “once it’s installed and running don’t touch it,” which is contrary to what we see in modern IT infrastructure where the mantra seems to be move faster to stay secure.

Mason:  I think that the scale of Facebook makes it easier for us to test things. It’s not that the testing work itself is easier, but we can spread that work over a large number of machines.We have the ability to take the testing work to what we call “Shadow Tiers.” On those Shadow Tiers, we can replay production traffic in a non-production environment so we can be in a very safe place to check performance and ensure stability. We can ramp that traffic up so I can start and say, “Okay, I’ll give it 5 percent of a replay of the production traffic and go all the way up to 100 and watch the performance current as I go.” I can get a very strong A/B comparison between two kernels along the way.

We have the tools to validate the kernels and to help test the upstream kernels. It’s easier to fix new and interesting bugs in upstream than it is to constantly just find old bugs that upstream has already fixed. What are the things that keep you worried?

Mason: In terms of running the Linux Kernel or file systems, we test so well and there’s so much community support around Linux that I don’t really worry about running that. You have been involved with Linux for a very long time and Linux just celebrated its 25th anniversary, what do you think Linux has achieved in these 25 years?

Mason: The part that I give Linus the most credit for, aside from the technical contributions which are obvious, is his ability to create the kernel community of developers where people were so actively interested in moving forward from version to version. Linux didn’t fragment the way so many other projects have. It’s not all Linus, but I give Linus so much credit because with the processes that he set up, it was much easier to move forward with the kernel than it was to fork it and do something different.

I think that’s an important contribution that a lot of people overlook in terms of how the kernel community has stuck together and brought in new companies instead of pushing them away.

Get started with Linux development. Check out the “Introduction to Linux, Open Source Development, and GIT” course from The Linux Foundation.