Home News Software Linux Kernel Development Upcoming Btrfs Features for Linux Containers

Upcoming Btrfs Features for Linux Containers

Brandon Philips is CTO at CoreOS, a new Linux distribution that has been rearchitected to provide features needed to run massive server deployments.

Brandon-Philips-CoreOSContainers have made huge progress in the last year with the addition of user namespaces to the Kernel, the introduction of Docker, LXC 1.0, and the maturing of Check Point Restore in Userspace (CRIU). And at the annual Linux Foundation Collaboration Summit last month there were a number of people talking about containers and their application in the Linux ecosystem.

In the hallway track I had a chance to catch up with two btrfs developers: Chris Mason and Zach Brown. With containers on my mind I asked them about two important features in btrfs and how they see them developing and being used.

First, I asked about subvolumes and snapshots; in particular the workloads that Docker puts btrfs through. Docker application containers have the very useful property that every time you start one it gets a fresh clean filesystem. A simple way to implement this would be to copy the "gold master container" into a new directory and start the new container there. If you had a lot of running containers this duplication would get expensive in both time to copy and disk space. Instead of this naive approach, Docker can use btrfs subvolumes and snapshots.

By using a btrfs snapshot of the "gold master container" Docker can make a new playground for this container with a single syscall and avoid the cost of duplicating all of that data into another directory. It sounds like the perfect use of the feature. But, I wanted to hear it from Chris himself, so I asked him what he thought of the "Docker Workload" and in his words he said: "I really want to see this use of btrfs and its features to be successful, please let me know if you run into any problems. I want Docker's workload to work great."

It was great to hear this sort of affirmation from the mainter of btrfs. It was icing on the cake since we had, a few weeks earlier, made the decision to make btrfs the root filesystem for CoreOS too.

The second topic was around cryptographic hashes of the filesystem data. Currently, btrfs uses CRC checksums which are great for catching data corruption. But, CRC can't be used like a SHA hash to cryptographically verify the contents weren't changed. Having the checksumming of btrfs extended to support this would open up interesting possibilities to mount filesystems only if they are the exact hash you expected.

Zach and Chris hope to start work on this feature this year. And said that btrfs was designed with this sort of use case in mind: the metadata space for checksums is 256bits with the possibility to expand.

This feature would be useful to ensure that your distro partition wasn't tampered with by an attacker. Or to verify that the copy of the files you have on disk are the exact version you expect. On CoreOS this would make it very straightfoward for us to use btrfs exclusively and verify that our read-only updates were applied correctly.

With all of this progress on containers it is great to know that we have a filesystem that plans to keep up. Thanks to Chris and Zach for explaining their plans and aspirations for btrfs and container goodness.



Subscribe to Comments Feed
  • lsatenstein Said:

    TImings for hashes. I wrote some routines to do hashing of directory contents. For a large directory (6400 files), on my system, crc32 took about 12 seconds, md5sum 16.4 seconds and sha1 processing about 35 seconds. My view is that md5sum hash is a good compromise for two reasons. a) The system id for the file system is known. (In case files are moved off the system, system id is a good tag). System id is not part of the hash, though it could be used to salt the calculation. b) Md5sum uses a 32bit word, Files within a directory are localized to a directory and it is rare to find a directory with 2^(32-1) files such that within a directory, two unique files would yield identical hashes. The hash algorithm I used included file-size along with the hash of the file contents. In all my testing, when a duplicate hash was detected, the files were seen to be duplicates of each other. What favors md5sum for me is the much lower cpu resource consumption and detection of file corruption..

Who we are ?

The Linux Foundation is a non-profit consortium dedicated to the growth of Linux.

More About the foundation...

Frequent Questions

Join / Linux Training / Board