June 10, 2008

Save disk space - use compFUSEd to transparently compress filesystems

Author: Ben Martin

The Filesystem in Userspace (FUSE) project allows you install new filesystems without touching your Linux kernel. The filesystems run as regular programs, allowing them to use shared libraries and perform tasks that would be difficult from inside the Linux kernel. FUSE filesystems look just like regular filesystems to other applications on the machine. In this article I'll look at compFUSEd, which is a compressed FUSE filesystem. Using compFUSEd can save a significant amount of disk space for files that are highly compressible, such as many text documents and executable files.

CompFUSEd is designed as an overlay filesystem. This means that it takes an existing "base" filesystem and presents the same filesystem with some modifications. In this case the modification is to (de)compress the files. CompFUSEd takes the data that is written to it, compresses it, and passes it off to an underlying "base" filesystem for storage. When you read a file through compFUSEd, it will read that file from the base filesystem and decompress it before giving it to you. That means applications can use a compFUSEd filesystem without knowing anything about compression or even that the data is compressed when stored on disk.

There are currently no packages of compFUSEd for Ubuntu, Fedora, or openSUSE. For this article I'll compile from source on a 64-bit Fedora 8 machine using version 200712321. The download page shows the compFUSEd tarball as cf-GISMO-date. Presumably the cf prefix is for compFUSEd. Expanding the tarball results in a directory called CompFused without any date version information in the directory name.

CompFUSEd has support for the compression libraries zlib, bzip2, lzo, and lzo2. I could not get support for the latter two to compile on Fedora 8. You must edit the Makefile to exclude support for compression libraries that you find undesirable or which fail to compile on your Linux installation. I also found that by default the compFUSEd build attempted to link to a profiler library, so you have to edit the Rules.make file to remove this link dependency. I also found that the build broke, complaining about symbols, shared objects, and the lack of the -fPICposition-independent code option. PIC code has the advantage that it can be loaded at different locations in memory; this is useful for compiling code for shared libraries because it allows them to be moved when multiple libraries would otherwise want the same address. Adding -fPIC to the CFLAGS in Rules.make and running make clean all resolves this problem, and made my compilation succeed.

$ tar xzvf /.../cf-GISMO-200712321.tgz
$ cd ./CompFused/Gismo/
$ vi Makefile
# Set to 1 to include support

$ vi Rules.make
CFLAGS= -Wall -pedantic -g -D_FILE_OFFSET_BITS=64 -fPIC

LIBS= -lfuse

$ make

There is no make install target, you must perform this step by hand:

# mkdir -p /usr/local/etc/
# cp compFUSEd.conf /usr/local/etc/compFUSEd.conf
# cp cf_main /usr/local/bin/compFUSEd
# mkdir /usr/local/lib/compFUSEd/
# cp -av plugins /usr/local/lib/compFUSEd/

CompFUSEd will check for a configuration file at /usr/local/etc/compFUSEd.conf, and in your home directory it will look for .compFUSEd, without a .conf extension. I copied the default configuration and set up a test mountpoint in my home directory. In the file, the string in square brackets is the path where you want to mount the compressed filesystem. The key = value settings that follow the square brackets set the options for this mountpoint. The backend option tells compFUSEd where to read and write the compressed files to. The compression and writer options specify how to compress your files and what policy to use when data is written to them. Compressed files are broken up internally by CompFUSEd into pieces called chunks. The chunk_size and chunk_max parameters specify how large each chunk is in bytes and how many chunks can be in RAM for each file. The writer plugin is responsible for saving the modified chunks of each file. Keeping the writer plugin separate from the core logic of CompFUSEd is a great design because efficient handling of the storage of chunks is a complicated task and will likely differ depending on how you intend to use the filesystem. The exclude parameter lists file extensions that should not be compressed by compFUSEd. Unfortunately you cannot specify the files to exclude using regular expressions.

$ cd ~
$ cp /usr/local/etc/compFUSEd.conf .compFUSEd
$ vi .compFUSEd
backend = /home/ben/.compFUSEd_test.backend
compression = /usr/local/lib/compFUSEd/plugins/cf_zlib.so
writer = /usr/local/lib/compFUSEd/plugins/writer_isimple.so
chunk_size = 8192 # That's 8K per chunk (uncompressed)
chunk_max = 100 # Up to 100 chunk of 8K open per file
exclude = gz # On this mount we compress everything except .gz files
$ compFUSEd ~/compFUSEd_test
Reading config file /usr/local/etc/compFUSEd.conf
done reading configuration file.
Reading config file /home/ben/.compFUSEd
done reading configuration file.
backend /home/ben/.compFUSEd_test.backend
compression /usr/local/lib/compFUSEd/plugins/cf_zlib.so
chunk writer /usr/local/lib/compFUSEd/plugins/writer_isimple.so
chunk size 8192
chunk max 100
compression threshold 0
| compFUSEd GISMO version 1
| by Johan Parent
| Please send bug reports, suggestion to compFUSEd.contact@gmail.com
| - DISCLAIMER: read it!
| - You run this program at your own risk!
| - Treat your backups with respect :-P
| - DO NOT store valuable data on this EXPERIMENTAL filesystem
| - NEVER modify anything in the backend directory
| while the compFUSEd filesystem is mounted
| Feedback at above mentioned address is welcome

$ df -h ~ ~/compFUSEd_test
Filesystem Size Used Avail Use% Mounted on
16G 11G 4.3G 71% /
compFUSEd 16G 0 16G 0% /home/ben/compFUSEd_test

$ fusermount -u ~/compFUSEd_test

The two writer plugins available are called writer_isimple and writer_smarter. The core difference between the two is that writer_smarter will not compact a file unless good compression is possible. File compaction is the process of moving the compressed chunks to remove unused space in the file. For example, if an application reads the third chunk of data and writes new data, compFUSEd will compress this new data as chunk three. If the data is smaller, then the chunks after chunk three might have to be moved closer to the start of the file. This can be expensive if the file is large and chunks at the start of the file are modified frequently. A common pattern for file modification is to rename an existing file and write the whole file contents again. This is the method employed by many text editors, so the plugin you use to handle writing in compFUSEd will not make any difference if you are only editing files on the compFUSEd filesystem with a text editor.

In the case of writing new files or whole file overwrites, compFUSEd will not have to keep moving chunks around, so performance will not degrade a great deal. The compaction only really affects programs that constantly access and overwrite small chunks of data in a file, as a relational database does. You will most likely be using compFUSEd on a filesystem that is not modified a great deal.

To test compFUSEd I downloaded the multiple pages version of the Linux Documentation Project HOWTO files. The tar.bz2 file is about 16MB in size, the expanded archive without any compression is 103MB, and the compFUSEd compressed directory 60MB.

I found a few issues with compFUSEd while performing the HOWTO test. Modifying the chunk_size from 8192 can cause a great deal of instability. I found that setting it to a very large value was acceptable, but I had issues with compFUSEd crashing when I changed the values in the range of 32KB. Other stability issues seemed to relate to storing a directory tree in the filesystem. If I just copied every file directly into the compFUSEd filesystem, not creating any subdirectories first, things worked fine. If, on the other hand, I attempted to directly expand the HOWTO tarball or copy an already expanded directory tree into a compFUSEd filesystem, compFUSEd would crash. The relevant segment of the configuration file and commands to populate the compFUSEd filesystem are shown below.

$ vi ~/.compFUSEd
backend = /home/ben/.howto.backend
compression = /usr/local/lib/compFUSEd/plugins/cf_bzip2.so
writer = /usr/local/lib/compFUSEd/plugins/writer_isimple.so
chunk_max = 100
chunk_size = 1024000
exclude = gif png jpg jpeg xpm xpi gz tar.gz
$ mkdir ~/howto ~/.howto.backend
$ compFUSEd ~/howto
$ cd /tmp
$ tar xjf /.../Linux-html-HOWTOs.tar.bz2
$ find /tmp/HOWTO -exec cp {} ~/howto/ \;

If you want to store a directory of files that you know can be compressed well, and you want to read the files out of that directory with applications that do not handle compression, then compFUSEd might be worth a look. Unfortunately there are still a few bugs in compFUSEd which hold it back from being a drop-in-and-go compression solution for directory trees.


  • System Administration
  • Tools & Utilities