December 8, 2008

Manage your mbox file with Archmbox

Author: Ben Martin

Archmbox lets you list, move, and copy messages from one mbox mail file to another, primarily for archiving messages. This tool lets you easily move all messages that are older than a given date into another (possibly compressed) mbox file, and you can also grab or delete messages by matching regular expressions against message headers.

Ubuntu Intrepid includes a package for archmbox, but there is no package in the Fedora 9 or openSUSE 11 repositories. I'll build from source using version 4.10.0 of archmbox on a 64-bit Fedora 9 machine. One slight hitch to using archmbox on a Fedora 9 machine is that the fuser executable is installed into /sbin, which is not normally in a non-root user's PATH. The ./configure command will fail to find fuser and stop. A workaround is to modify your PATH prior to running ./configure, as shown below.

$ tar xzvf /.../archmbox-4.10.0.tar.gz
$ cd ./archmbox*/
$ export PATH=$PATH:/sbin
$ type -all fuser
fuser is /sbin/fuser
$ ./configure
$ make
$ sudo make install

Archmbox has four basic modes: list, copy, move, and delete. The list method lets you inspect your mbox files, and the other options let you archive or delete messages. You can compress an output mbox file with gzip by supplying the --compress option, or use bzip2 by supplying the --bzip2 option.

At the end of the command line on which you invoke archmbox you list one or more input mbox files you want to operate on. You can also specify a directory name and use the -R or --recursive option to have archmbox descend recursively into that directory, operating on each matching mbox files it finds. What about the output mbox file names? By default, the output mbox file is derived from the input file name with .archived appended. Use -n or --archive-name to explicitly name the output mbox file, and -p or --archive-path to specify the directory where you want the output mbox files to be written.

To test archmbox, I used XFS filesystem mail list archives in the below examples. Each month of the archives is split into its own file. Files are named year-month, where both are numeric values. As the article was written a few days before the end of November, I'll focus on 2008-10 as the "most recent" mbox so that the examples are repeatable.

You always have to specify a starting time using either -d/--date or -o/--offset. The -d option takes a date in the format yyyy-mm-dd; -o takes the number of days before now. Using the -o -1 combination tells archmbox not to use any date threshold.

The first command below attempts to list the complete contents of the mbox file using a relative path. With archmbox you have to specify the paths to mbox files as absolute paths, as the second, working, invocation shows.

$ archmbox --list -o -1 ./2008-10
./2008-10: use full path!
$ archmbox --list -o -1 `pwd`/2008-10

1 WinMax Video <info@varied Media Marketing The New Age of Growth!
2 Christoph Hellwig <hch@ls Re: [PATCH 2/3] Remove restricted_chown pa

Unfortunately the utility can handle only a single time threshold, so if you want to grab all messages between two dates you'll have to grab everything newer than the start of the range you are interested in, then run a second job to delete everything after the end of the range you are interested in. You could also use the --regexp option, which we'll look at in a moment, but then you'd have to construct a regular expression to match your date range instead of using time values.

--copy mode is handy when you want to glue together mbox files, or parts of them. The below command will create a 2008-9-to-10.archived mbox file containing the contents of three XFS single month mbox files.

$ archmbox --copy \
-o -1 \
-n 2008-9-to-10 \
`pwd`/2008-10 `pwd`/2008-09 `pwd`/2008-08

When you use -y/--copy or -a/--archive mode, messages are appended to the output mbox file. In the below tests I first remove the output mbox file to keep it from accruing messages along the way. I ran the tests on November 26, and I was dealing with an mbox that contains only messages from October, so using an --offset of 25 in the first command means that every message in the input mbox will be copied. Using an --offset of 27 in the second command excludes the last days of October, so the number of copied messages decreases. The -r/--reverse option in the third command makes archmbox treat your time threshold not as the upper bound but as the lower bound so that only messages that are newer than your time threshold are copied.

$ date
Wed Nov 26 16:14:04 EST 2008

$ rm -f 2008-10.archived;
$ archmbox --copy --totals --verbose 2 --offset 25 `pwd`/2008-10
Overall summary
Parsed messages: 818
Total used space (MB): 7.41
Copied messages: 818
Total saved space (MB): 7.41

$ rm -f 2008-10.archived;
$ archmbox --copy --totals --verbose 2 --offset 27 `pwd`/2008-10
Parsed messages: 818
Total used space (MB): 7.41
Copied messages: 775
Total saved space (MB): 7.15

$ rm -f 2008-10.archived;
$ archmbox --copy --totals --verbose 2 --offset 27 --reverse `pwd`/2008-10
Copied messages: 43

The -a/--archive option works like the --copy option, but messages that are copied are also removed from the input mbox. If you just want to throw away matching messages, -k/--kill works like --archive, but the messages that are to be archived are not copied before they're deleted.

If you are using archmbox to archive the older messages from your mbox file, you might like to use the --keep-flagged and --keep-unread options, which stop archmbox from archiving messages that are flagged or that you've not read yet.

There are two options that let you match a regular expression against the headers of a message. You can use both options a number of times to easily match against multiple headers. With many -x or --regexp options present, if any of the regular expressions matches a message then the mail matches. With many -X or --Regexp options, all of the regular expressions must match a message for the mail to match. If the regular expression uses all lower case, it is matched on a case-insensitive basis. If any capital letter appears in the regular expression, a case-sensitive match is performed.

The -x and -X options take the case-sensitive header to match against, an equals sign, and the regular expression to match that header against. In the examples below I first look for any messages with a subject about corruption. The second command finds only messages that are about filesystem corruption that were also sent by Christoph Hellwig. If I used the lower case -x to specify the regular expression constraints, archmbox would return all messages matching either regular expression (a total of 199 messages).

$ archmbox --list --totals --offset -1 -x Subject='filesystem corruption' `pwd`/2008-10

12 Dave Chinner <david@fromo Re: XFS filesystem corruption on the arm(e
13 Eric Sandeen <sandeen@san Re: XFS filesystem corruption on the arm(e
14 Eric Sandeen <sandeen@san Re: XFS filesystem corruption on the arm(e

$ archmbox --list --totals --offset -1 -X Subject='filesystem corruption' -X From='Christoph Hellwig' `pwd`/2008-10

361 Christoph Hellwig <hch@in Re: XFS filesystem corruption on the arm(e
370 Christoph Hellwig <hch@in Re: XFS filesystem corruption on the arm(e

Mailbox '/home/ben/testing/archmbox/2008-10': 818 messages (7.41 MB)
For archive: 2 messages (0.01 MB)

Archmbox makes it trivial to select and archive any read messages from an mbox file, and compress the archived mbox for you. Although archiving is clearly the primary intended use for archmbox, you can also use it to peek into an mbox file, or grab a subset of messages from one or more mbox files based on time or regular expression constraints.


  • Mail & Messaging
  • Tools & Utilities
Click Here!