June 20, 2005

CLI Magic: Use Extended Attributes for better file management

Author: JT Smith

There are many organizational techniques that contribute to efficient
file management. Thoughtful and effective directory hierarchies help
users locate content with ease. Consistent and expressive file
nomenclatures give users the ability to discern the nature of a
file's content at a glance. Unfortunately, there are many things that
even the most expressive file name can't convey. In some
cases, there is just too much information, most of which doesn't warrant
inclusion in a concise file name. Many unique file formats now include
embedded meta-data mechanisms that provide users with a way to 'tag'
files. With a specialized tag editor, users can easily associate a
title, artist and album with a specific MP3 file, for instance. Wouldn't
it be nice to be able to associate arbitrary tag data with any kind
file or directory? With extended attributes, you can.Extended attributes are essentially name/value pairs that can be
assigned to any file or directory. This powerful file system meta-data
feature, when intelligently used, can facilitate tremendously efficient
file management. The goal of this example-driven overview is to
illustrate the power of extended attributes and demonstrate potential
uses. You will need a basic understanding of awk and bash to appreciate
the significance of some of the examples, but even users without a lot
of command line experience will be able to understand the commands and
figure out how to use them. By the time you finish reading this article,
you will be able to take advantage of the coolest file system feature to
be implemented since the symbolic link.

Getting Started

The first step is configuration. In days of yore, intrepid users had to
patch their kernel to get support for extended attributes.
Lucky for you, the feature is now a standard part of the 2.4 and 2.6
kernels, and it is widely supported by most major distributions and
almost all the prominent file systems. In most cases, it is just a
matter of enabling the feature. If you are fortunate enough to be using
an XFS file system, the feature is already enabled and you can skip the
rest of this section. If you are using ext2, ext3 or reiser, you will
have to add the user_xattr flag to the drive's entry in your/etc/fstab
file and remount the partition. The updated entry should vaguely
resemble this:

/dev/hda1 / ext3 defaults,noatime,user_xattr 0 1

After you have altered your fstab file, you can either reboot your
computer, or remount the partition like so:

mount -o remount,user_xattr /

Now you need to install the commands and the associated library. An
'attr' package
is available for many distributions, and source code is available from the SGI web site.

That's all there is to it. Now that everything is configured properly,
it's time to learn some new commands.

The Commands

Manipulation of extended attributes can be done with three commands:
setfattr, getfattr, and attr. This article does not cover the attr
command, which exists solely for the sake of IRIX compatibility.
As you can guess, the setfattr command sets attributes, and the getfattr
command retrieves them. To start with, we will add an attribute named
'testing', with a value of 'this is a test', to a file called
'test-1.txt':


setfattr -n user.testing -v "this is a test" test-1.txt

The '-n' parameter specifies the name of the attribute. Period-delimited
attribute namespaces are used to reduce naming conflicts. All attributes
explicitly added by users must be in the 'user' namespace, which is why
the attribute in the example is named 'user.testing'. For serious
employment of this feature, more elaborate namespaces are advisable. The
'-v' parameter specifies the value of the attribute. Note that the value
is enclosed in quotation marks because the string includes spaces.

Now we will use the getfattr command to retrieve the 'testing'
attribute:


getfattr -n user.testing test-1.txt

this will cause the following output:

  # file: test-1.txt
  user.testing="this is a test"

If you just want it to display the value of the attribute, you can use
the '--only-values' parameter:

getfattr --only-values -n user.testing test-1.txt

The setfattr command is also used for removing attributes. To get rid of
our 'testing' attribute, all I have to do is:

setfattr -x user.testing test-1.txt

Unfortunately, not all programs and file systems support extended
attributes. If you copy the files to a different file system or
manipulate the files with a utility that doesn't support the feature,
the attributes will disappear. If you want to preserve the attributes,
you can use the '--dump' parameter of the getfattr command to generate a
complete listing of all the attributes and values associated with the
target:


getfattr --dump * > data_file

When you move the files back to a file system that supports extended
attributes, you can restore the attribute data to the files by using the
'--restore' parameter of the setfattr command:



setfattr --restore=data_file

There are a few other options and parameters that are not covered in
this article. For more information, you can refer to the man pages.

A Few (Relatively) Simple Examples

As a journalist, I write a lot of articles. The number of files in
~/doc/technical/articles has steadily grown, and it occurs to me that it
will soon become difficult to manage. I can use extended attributes to
simplify the task.

First, lets think about the kind of attributes it might be helpful to associate with articles. There are a few in
particular that come to mind: title, date of publication, and
publication venue. There are many other attributes that I could add, but
I want to keep it simple, and I want to avoid assigning attributes for
things like word count that I can easily ascertain with other simple
commands. Now let's think about namespace issues. In order to prevent my
attribute names from conflicting with attributes added in the future by
other programs, I will put all my attributes in the 'user.article'
namespace.

Now I will manually set the individual attributes for all of my
articles. Here is an example:

setfattr -n user.article.title -v "Innovations in Window Management"

article-commentary-wm_innovations.txt

It is also possible to add an attribute to multiple files at once. To
demonstrate this, I will add an article.author tag to all my article
files:


setfattr -n user.article.author -v "Ryan Paul" *.txt

Now I'll show you how to use the attributes for filtering and file
management. Let's start by trying to list all of the files with articles
that I have written for Newsforge. The format for the getfattr command
is kind of unusual, so we have to use awk to extract the relevant data.
The getfattr output looks something like this:

  # file: article-commentary-wm_innovations.txt
  user.article.venue="newsforge"

  # file: article-comparison-xml_authoring_tools.txt
  user.article.venue="newsforge"

If we treat each '# file: ' entry as an awk record, and each line of
that record as an awk field, we should be able to get what we want by
grabbing the first field of every record that contains '="newsforge"':

getfattr -n user.article.venue *.txt | awk 'BEGIN {RS="# file: ";

FS="\n"}/="newsforge"/ {print $1}'

To simplify matters, we can abstract this into a bash function:

  ea_query() {
    a=$1; v=$2
    shift 3
    getfattr -n $a $* |
    awk "BEGIN  {
      RS=\"# file: \"
      FS=\"\n\"
    }/=\"$v\"/ { print \$1 }"
  }

Which we can use to perform arbitrary queries. The following
command-line call:


ea_query user.article.venue newsforge *.txt

will list all the.txt files for which 'newsforge' is the value
associated with the 'user.article.venue' attribute. Now let's try using
the queries for some simple file management. If I want to copy all the
files containing articles written for newsforge into a
'newsforge_articles' directory, I can do this:

cp `ea_query user.article.venue newsforge *.txt` newsforge_articles

What if I want to make a tarball containing all articles I wrote for
newsforge that are longer than 1200 words? To find out which articles
are longer than 1200 words, I use an old trick: I filter 'wc -w' through
awk and output the names of all the files that fulfill a greater-than
comparison. I can use the output of that as the input for the query, and
I can use the output of the query as the input for tar:


tar -czf long_newsforge_articles.tgz $(ea_query user.article.venue

newsforge $(wc -w *.txt | awk '/\.txt/ {if ($1 > 1200) print $2 }'))

A useful variation on the above example might involve compressing
newsforge articles containing a specific word or phrase. You can do that
by using grep rather than wc and awk.

Once you have put together a few handy bash functions or shell scripts
for attribute manipulation, you should be able to integrate extended
attribute queries into your repertoire of file management techniques
with relative ease.

An Arcane Example

For the benefit of system administrators and ambitious readers, I will
now present a more sophisticated command-line example. I am going to
show how I list, in order by publication date, the title and filename of
every review I have written for Newsforge that has been published since
November of 2004, is longer than 1000 words, and contains the word
"Linux". For this example, you will need ruby, and Aredridel's excellent
xattr module (http://theinternetco.net/projects/ruby/ruby-xattr ).

I have included line breaks to increase the readability of the example:


  ruby -r xattr -e '

    pubdate = proc {|f|
      Time.gm(*f.get_attr("article.date.published").spli  t("-"))
    };

    Dir["*review*.txt"].map {|fn| File.open fn }.find_all {|f|

      c = f.read;
      f.get_attr("article.venue") == "newsforge" and
      c.split.length > 1000 and
      c.include? "Linux" and
      pubdate[f] > Time.gm(2004, "nov", 01)

    }.sort {|f1,f2| pubdate[f1]  pubdate[f2] }.each {|f|

      puts "#{f.get_attr("article.title")} #{f.path}"

    }'

I start by defining a 'pubdate' function that will convert my
"date.published" attribute string into a ruby 'Time' instance. Then, I
use the 'Dir[]' class method to generate a list of all files in the
current directory that match the "*review*.txt" glob. I filter that list
of files through a 'find_all' block that performs the necessary checks,
and then I pass the results to a 'sort' block that performs publication
date comparisons using the 'pubdate' function. Finally, I send the title
and name of each file to stdout in a concluding 'each' block. Note that
I use the '-r xattr' parameter to include Aredridel's module.

Ruby lends itself well to command line administrative work. You can use
variations of the above example for a wide variety of file and system
management tasks in native Ruby, or you can output file names, and pipe
the result to other shell commands.

If you have some programming experience, you can use the xattr lib to
make elaborate scripts and utilities that can manipulate extended
attributes. For those who prefer Python for application development,
pyxattr is available here.

Conclusion

Whenever I learn a new command or a new command line trick, I celebrate
by putting it to good use. I've given you a good start. Now it's your
turn. Find a creative way to use extended attributes, and demonstrate
your command line prowess by leaving a comment with a few examples, or
by sharing your experiences.