June 19, 2008

Using ZFS though FUSE

Author: Ben Martin

ZFS is an advanced filesystem created by Sun Microsystems but not supported in the Linux kernel. The ZFS_on_FUSE project allows you to use ZFS through the Linux kernel as a FUSE filesystem. This means that a ZFS filesystem will be accessible just like any other filesystem the Linux kernel lets you use.

Apart from any technical or funding issues, a major reason that ZFS support has not been integrated into the Linux kernel is that Sun has released it under its Common Development and Distribution License, which is incompatible with the GPL used by the kernel. There are also patent issues with ZFS. However, the source code for ZFS is available, and running ZFS through FUSE does not violate any licenses, because you are not linking CDDL and GPL code together. You're on your own as far as patents go.

The idea of running what is normally an in-kernel filesystem through FUSE will make some in-kernel filesystem developers grumble about inefficiency. When an application makes a call into the kernel, a context switch must be performed. The x86 architecture is not particularly fast at performing context switches. Because a FUSE filesystem runs out of the kernel, the kernel must at times perform a context switch to the FUSE filesystem. This means that overall there are more context switches required to run a filesystem through FUSE than in-kernel. However, accessing information that is stored on disk is so much slower than performing a context switch that performing two instead of one context switch is likely to have minimal impact, if any, on benchmarks. It has been reported that NTFS running through FUSE has results comparable to those of a native Linux filesystem.

Installation

No packages for zfs-fuse exist for Ubuntu, openSUSE, or Fedora. As of writing, the latest release of zfs-fuse, 0.4.0 beta, is from March 2007. Looking at the source repository for the 0.4.x version of zfs-fuse, it appears the developers have made many desirable additions since then -- for example, the ability to compile using recent versions of gcc, which were not available in the March 2007 release. I used the 0.4.x version from the source repository instead of the latest released tarball and performed benchmarking on a 64-bit Fedora 8 machine.

The source repository uses the Mercurial revision control system, which is itself available in the main Hardy and Fedora 9 repositories. To compile zfs-fuse you will need SCons and the development package for libaio. Both of these are packaged for Hardy (libaio-dev, scons), openSUSE 10.3 1-Click installs (libaio-devel, scons), and in the Fedora 9 repository. The installation step places five executables into /usr/local/sbin.

$ hg clone http://www.wizy.org/mercurial/zfs-fuse/0.4.x
$ cd 0.4.x/src
$ scons
$ sudo scons install
$ sudo zfs-fuse

Once the zfs-fuse daemon is started you use the zpool and zfs commands to set up your zfs filesystems. If you have not used ZFS before, you might like to read the OpenSolaris intro or the more serious documentation for it.

Performance

I tested performance inside a VMWare server virtual machine. I created a new virtual disk, preallocating 8GB of space for the disk. The use of virtualization would likely affect the overall benchmark, but the relative performance of ZFS vs. the in-kernel filesystem should still be indicative of the performance you might expect from ZFS running through FUSE. As the in-kernel Linux filesystem I used XFS because it performs well on large files such as the Bonnie++ benchmark I used.

The design of ZFS is a little different from that of most Linux filesystems. Given one or more partition, you set up a ZFS "pool," and then create as many filesystems as you like inside that pool. For the benchmark I created a pool for a single partition on the 8GB virtual disk and create two ZFS filesystems on that pool. To benchmark XFS I created an XFS filesystem directly on the partition that ZFS was using, wiping out the ZFS data in the process.

Shown below is the setup and benchmarking of ZFS. First I use fdisk to create a new partition for the whole disk. I use the zool create command to create new pools, associating physical disks with the pool. The -n option informs you of what would have been done but doesn't actually make the pool. I include its output here to make things easier to follow. Once I create the tank/testfs ZFS filesystem with the zfs command, I have a new filesystem that I can access through the Linux kernel at /tank/testfs, as shown using the standard df command. I then ran the Bonnie benchmark multiple times to make sure that the figures were not taken from a first run that was disadvantaged in any manner.

# fdisk /dev/sdd
...
Disk /dev/sdd: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
...
/dev/sdd1 1 1044 8385898+ 83 Linux
...
# zfs-fuse
# zpool create -n tank /dev/sdd1
would create 'tank' with the following layout:

tank
sdd1
# zpool create tank /dev/sdd1
# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
tank 7.94G 92.5K 7.94G 0% ONLINE -
# zfs create tank/testfs
# df -h /tank/testfs/
Filesystem Size Used Avail Use% Mounted on
tank/testfs 7.9G 18K 7.9G 1% /tank/testfs

$ cd /tank/testfs
$ /usr/sbin/bonnie++ -d `pwd`
...
$ /usr/sbin/bonnie++ -d `pwd`
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
linuxcomf8 4G 12373 24 14707 11 10604 8 33935 50 36985 3 109.0 0
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 2272 17 3657 20 2754 18 2534 15 3736 20 3061 20
linuxcomf8,4G,12373,24,14707,11,10604,8,33935,50,36985,3,109.0,0,16,2272,17,3657,20,2754,18,2534,15,3736,20,3061,20

The commands below show how the Bonnie benchmark was performed on the XFS filesystem. Once again, I ran the benchmarks multiple times.

# mkfs.xfs /dev/sdd1
meta-data=/dev/sdd1 isize=256 agcount=8, agsize=262059 blks
= sectsz=512 attr=0
data = bsize=4096 blocks=2096472, imaxpct=25
= sunit=0 swidth=0 blks, unwritten=1
naming =version 2 bsize=4096
log =internal log bsize=4096 blocks=2560, version=1
= sectsz=512 sunit=0 blks, lazy-count=0
realtime =none extsz=4096 blocks=0, rtextents=0
# mkdir /raw
# mount /dev/sdd1 /raw

$ cd /raw
$ /usr/sbin/bonnie++ -d `pwd`
...
$ /usr/sbin/bonnie++ -d `pwd`
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
linuxcomf8 4G 38681 65 34840 6 16528 6 18312 40 18585 5 365.8 2
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 1250 26 +++++ +++ 3032 39 2883 69 +++++ +++ 3143 59
linuxcomf8,4G,38681,65,34840,6,16528,6,18312,40,18585,5,365.8,2,16,1250,26,+++++,+++,3032,39,2883,69,+++++,+++,3143,59

As you can see from the benchmark results, for output operations you might only get 30-60% the performance of XFS using ZFS through FUSE. On the other hand, the caching that FUSE performs allowed zfs-fuse to perform noticeably better than XFS for both character and block input tests. In real world terms, this means that there is no speed penalty for using ZFS through FUSE for a filesystem that is read more than it is written. Write operations do suffer a performance loss with zfs-fuse as apposed to an in-kernel filesystem, but the loss should not render the system unusable. As always, you should benchmark for the task you have at hand to make sure you can get the performance you expect.

There are many issues with running ZFS under Linux. For instance, the fact that the zfs-fuse FUSE process runs as the root user implies potential security issues and gives any bugs that might be present in zfs-fuse free rein over the sysstem. Also, the sharenfs ZFS directive does not currently work with zfs-fuse, and if you wish to export your ZFS filesystems manually then you'll likely have to recompile your FUSE kernel module too.

zfs-fuse does bring the flexibility of creating many filesystems using ZFS, and the manner in which quotas and space reservation is performed can make system administration to Linux. Because of the way ZFS uses pools to let you quickly create as many filesystems as you like, it's not uncommon to create a new ZFS filesystem in your pool for a new project you are working on. New filesystems being quick and easy to create works well with the rest of ZFS administration, where you can snapshot a ZFS filesystem in its current state and export the current filesystem or a snapshot to another machine. Though, as mentioned above, the sharenfs directive is currently not supported by zfs-fuse.

ZFS also reimplements much of the functionality of the Linux kernel, such as software RAID and logical volume management combination (LVM). One downside of this, as is noted in the March 2008 ZFS administration documentation on page 60, is that you cannot attach an additional disk to an existing RAID-Z configuration. With Linux, you can grow an existing RAID-5 array, adding new disks as you desire.

Category:

  • System Administration