Linux.com
Everything Linux and Open Source
CLI Magic: Use Extended Attributes for better file management
June 20, 2005 (8:00:00 AM) - 4 years, 5 months ago
By: Ryan Paul
There are many organizational techniques that contribute to efficient
file management. Thoughtful and effective directory hierarchies help
users locate content with ease. Consistent and expressive file
nomenclatures give users the ability to discern the nature of a
file's content at a glance. Unfortunately, there are many things that
even the most expressive file name can't convey. In some
cases, there is just too much information, most of which doesn't warrant
inclusion in a concise file name. Many unique file formats now include
embedded meta-data mechanisms that provide users with a way to 'tag'
files. With a specialized tag editor, users can easily associate a
title, artist and album with a specific MP3 file, for instance. Wouldn't
it be nice to be able to associate arbitrary tag data with any kind
file or directory? With extended attributes, you can.
Extended attributes are essentially name/value pairs that can be
assigned to any file or directory. This powerful file system meta-data
feature, when intelligently used, can facilitate tremendously efficient
file management. The goal of this example-driven overview is to
illustrate the power of extended attributes and demonstrate potential
uses. You will need a basic understanding of awk and bash to appreciate
the significance of some of the examples, but even users without a lot
of command line experience will be able to understand the commands and
figure out how to use them. By the time you finish reading this article,
you will be able to take advantage of the coolest file system feature to
be implemented since the symbolic link.
Getting Started
The first step is configuration. In days of yore, intrepid users had to patch their kernel to get support for extended attributes. Lucky for you, the feature is now a standard part of the 2.4 and 2.6 kernels, and it is widely supported by most major distributions and almost all the prominent file systems. In most cases, it is just a matter of enabling the feature. If you are fortunate enough to be using an XFS file system, the feature is already enabled and you can skip the rest of this section. If you are using ext2, ext3 or reiser, you will have to add the user_xattr flag to the drive's entry in your<nobr> <wbr></nobr>/etc/fstab file and remount the partition. The updated entry should vaguely resemble this:
<nobr> <wbr></nobr>/dev/hda1 / ext3 defaults,noatime,user_xattr 0 1
After you have altered your fstab file, you can either reboot your computer, or remount the partition like so:
mount -o remount,user_xattr /
Now you need to install the commands and the associated library. An 'attr' package is available for many distributions, and source code is available from the SGI web site.
That's all there is to it. Now that everything is configured properly, it's time to learn some new commands.
The Commands
Manipulation of extended attributes can be done with three commands: setfattr, getfattr, and attr. This article does not cover the attr command, which exists solely for the sake of IRIX compatibility. As you can guess, the setfattr command sets attributes, and the getfattr command retrieves them. To start with, we will add an attribute named 'testing', with a value of 'this is a test', to a file called 'test-1.txt':
setfattr -n user.testing -v "this is a test" test-1.txt
The '-n' parameter specifies the name of the attribute. Period-delimited attribute namespaces are used to reduce naming conflicts. All attributes explicitly added by users must be in the 'user' namespace, which is why the attribute in the example is named 'user.testing'. For serious employment of this feature, more elaborate namespaces are advisable. The '-v' parameter specifies the value of the attribute. Note that the value is enclosed in quotation marks because the string includes spaces.
Now we will use the getfattr command to retrieve the 'testing' attribute:
getfattr -n user.testing test-1.txt
this will cause the following output:
# file: test-1.txt user.testing="this is a test"
If you just want it to display the value of the attribute, you can use the '--only-values' parameter:
getfattr --only-values -n user.testing test-1.txt
The setfattr command is also used for removing attributes. To get rid of our 'testing' attribute, all I have to do is:
setfattr -x user.testing test-1.txt
Unfortunately, not all programs and file systems support extended attributes. If you copy the files to a different file system or manipulate the files with a utility that doesn't support the feature, the attributes will disappear. If you want to preserve the attributes, you can use the '--dump' parameter of the getfattr command to generate a complete listing of all the attributes and values associated with the target:
getfattr --dump * > data_file
When you move the files back to a file system that supports extended attributes, you can restore the attribute data to the files by using the '--restore' parameter of the setfattr command:
setfattr --restore=data_file
There are a few other options and parameters that are not covered in this article. For more information, you can refer to the man pages.
A Few (Relatively) Simple Examples
As a journalist, I write a lot of articles. The number of files in ~/doc/technical/articles has steadily grown, and it occurs to me that it will soon become difficult to manage. I can use extended attributes to simplify the task.
First, lets think about the kind of attributes it might be helpful to associate with articles. There are a few in particular that come to mind: title, date of publication, and publication venue. There are many other attributes that I could add, but I want to keep it simple, and I want to avoid assigning attributes for things like word count that I can easily ascertain with other simple commands. Now let's think about namespace issues. In order to prevent my attribute names from conflicting with attributes added in the future by other programs, I will put all my attributes in the 'user.article' namespace.
Now I will manually set the individual attributes for all of my articles. Here is an example:
setfattr -n user.article.title -v "Innovations in Window Management" article-commentary-wm_innovations.txt
It is also possible to add an attribute to multiple files at once. To demonstrate this, I will add an article.author tag to all my article files:
setfattr -n user.article.author -v "Ryan Paul" *.txt
Now I'll show you how to use the attributes for filtering and file management. Let's start by trying to list all of the files with articles that I have written for Newsforge. The format for the getfattr command is kind of unusual, so we have to use awk to extract the relevant data. The getfattr output looks something like this:
# file: article-commentary-wm_innovations.txt user.article.venue="newsforge" # file: article-comparison-xml_authoring_tools.txt user.article.venue="newsforge"
If we treat each '# file: ' entry as an awk record, and each line of that record as an awk field, we should be able to get what we want by grabbing the first field of every record that contains '="newsforge"':
getfattr -n user.article.venue *.txt | awk 'BEGIN {RS="# file: ";
FS="\n"}<nobr> <wbr></nobr>/="newsforge"/ {print $1}'
To simplify matters, we can abstract this into a bash function:
ea_query() {
a=$1; v=$2
shift 3
getfattr -n $a $* |
awk "BEGIN {
RS=\"# file: \"
FS=\"\n\"
}<nobr> <wbr></nobr>/=\"$v\"/ { print \$1 }"
}
Which we can use to perform arbitrary queries. The following command-line call:
ea_query user.article.venue newsforge *.txt
will list all the<nobr> <wbr></nobr>.txt files for which 'newsforge' is the value associated with the 'user.article.venue' attribute. Now let's try using the queries for some simple file management. If I want to copy all the files containing articles written for newsforge into a 'newsforge_articles' directory, I can do this:
cp `ea_query user.article.venue newsforge *.txt` newsforge_articles
What if I want to make a tarball containing all articles I wrote for newsforge that are longer than 1200 words? To find out which articles are longer than 1200 words, I use an old trick: I filter 'wc -w' through awk and output the names of all the files that fulfill a greater-than comparison. I can use the output of that as the input for the query, and I can use the output of the query as the input for tar:
tar -czf long_newsforge_articles.tgz $(ea_query user.article.venue
newsforge $(wc -w *.txt | awk '/\.txt/ {if ($1 > 1200) print $2 }'))
A useful variation on the above example might involve compressing newsforge articles containing a specific word or phrase. You can do that by using grep rather than wc and awk. Once you have put together a few handy bash functions or shell scripts for attribute manipulation, you should be able to integrate extended attribute queries into your repertoire of file management techniques with relative ease.
An Arcane Example
For the benefit of system administrators and ambitious readers, I will now present a more sophisticated command-line example. I am going to show how I list, in order by publication date, the title and filename of every review I have written for Newsforge that has been published since November of 2004, is longer than 1000 words, and contains the word "Linux". For this example, you will need ruby, and Aredridel's excellent xattr module (http://theinternetco.net/projects/ruby/ruby-xatt<nobr>r<wbr></nobr> ). I have included line breaks to increase the readability of the example:
ruby -r xattr -e '
pubdate = proc {|f|
Time.gm(*f.get_attr("article.date.published").spl<nobr>i<wbr></nobr> t("-"))
};
Dir["*review*.txt"].map {|fn| File.open fn }.find_all {|f|
c = f.read;
f.get_attr("article.venue") == "newsforge" and
c.split.length > 1000 and
c.include? "Linux" and
pubdate[f] > Time.gm(2004, "nov", 01)
}.sort {|f1,f2| pubdate[f1] pubdate[f2] }.each {|f|
puts "#{f.get_attr("article.title")} #{f.path}"
}'
I start by defining a 'pubdate' function that will convert my "date.published" attribute string into a ruby 'Time' instance. Then, I use the 'Dir[]' class method to generate a list of all files in the current directory that match the "*review*.txt" glob. I filter that list of files through a 'find_all' block that performs the necessary checks, and then I pass the results to a 'sort' block that performs publication date comparisons using the 'pubdate' function. Finally, I send the title and name of each file to stdout in a concluding 'each' block. Note that I use the '-r xattr' parameter to include Aredridel's module.
Ruby lends itself well to command line administrative work. You can use variations of the above example for a wide variety of file and system management tasks in native Ruby, or you can output file names, and pipe the result to other shell commands.
If you have some programming experience, you can use the xattr lib to make elaborate scripts and utilities that can manipulate extended attributes. For those who prefer Python for application development, pyxattr is available here.
Conclusion
Whenever I learn a new command or a new command line trick, I celebrate by putting it to good use. I've given you a good start. Now it's your turn. Find a creative way to use extended attributes, and demonstrate your command line prowess by leaving a comment with a few examples, or by sharing your experiences.
Read in the original layout at: http://www.linux.com/archive/feature/114027