February 1, 2007

Handling RAR and 7-Zip archives in Linux


The RAR and 7-Zip file compression formats originated on Windows, so support for them on Linux is not as automatic as it is for old Unix holdovers like Gzip and TAR. But with the right software, you can handle these compressed files without much trouble.



First, a little background. RAR -- short for Roshal Archive -- is a proprietary file compression format developed by Eugene Roshal. Roshal sells a commercial utility for Windows called WinRAR, but also offers uncompress-only clients for a number of operating systems at no charge.

7-Zip is a Windows application designed to handle as many compression formats as possible. Its native format is 7z, which uses a modern offshoot of the LZ77 compression algorithm. The 7-Zip Windows app and a 7z software development kit are available under the LGPL.

Both formats purport to achieve better compression ratios for common data types than older algorithms. They also support splitting large archives into multiple small volumes for easier mobility and error recovery. The combination of those two features had made them popular choices for online distribution of extremely large files such as ISO images.

7z and Linux

The 7-Zip app is open source, but remains Windows-only. For Linux users, the project links to a command-line client package named p7zip that provides two executables, 7z and 7za. The two have the same syntax and options, differing only in that 7za is a self-contained app compiled only for use with 7z and the essential Unix formats (tar, gzip, bzip2, etc.), while 7z uses a plugin architecture that allows it to support many additional compression formats.

The basic syntax is 7z function options filename.7z. To uncompress an archive, use 7z x myfile.7z. You can extract files from an archive with 7z e myotherfile.7z, but using the e function extracts all files to the current working directory, whereas x preserves their paths.

RAR and Linux

The RAR situation is a bit more complicated, due to the file format's proprietary compression scheme. The RARLAB site provides a no-charge proprietary uncompress-only client for Linux called unrar, designed for 32-bit Intel distros in both RPM and Slackware packages, and as standalone binaries for PowerPC, 64-bit Intel, and ARM Linux systems. Since RARLAB's unrar program is neither free software nor open source, you are unlikely to find it shipping with many Linux distributions. You can download a source code tarball from RARLAB, but the attached license explicitly forbids you from using the source code to develop any form of RAR encoder.

Another option is a GPLv2-licensed command-line tool developed by the Gna! project. Confusingly enough, the open source RAR decoder is also named unrar. Gna! unrar is designed as a wrapper around unrarlib, an open source RAR decoding library developed by Christian Scheurer and Johannes Winkelmann -- who are not part of Gna!.

Scheurer and Winkelmann developed unrarlib from the original RARLAB source code, but asked for and obtained permission from Eugene Roshal to release their work as free software. Thus, unrarlib is available under the GPLv2 and under RARLAB's original proprietary license.

That licensing arrangement would seem to clear a path for interested parties to do an end-run around RARLAB and create a competing RAR encoder from the original source code, but that hasn't happened yet. Scheurer himself isn't interested in pursuing it, saying that he prefers to use open source formats for making archives. "You cannot always choose in what format you get the data, so it is fine to have an open way to access it. But you can choose the way you create an archive. If you don't want closed source compression tools, there are good alternatives."

The proprietary unrar uses the same basic syntax as 7z and 7za. To uncompress and archive and preserve file paths, type unrar x myarchive.rar. In the GPL unrar, you simply add a hyphen before the x: unrar -x myotherarchive.rar.

Today, unrarlib only supports up to version 2 of the RAR file format. Scheurer says he is working on adding support for the newer RAR3 format to unrarlib, but says he is not sure what reaction to expect from RARLAB.

GUI support

If you do most of your work from a Linux window manager, you're in luck. Both GNOME and KDE have graphical archive managers -- File Roller for GNOME and Ark for KDE. Recent versions of both use plugins to support a wide variety of archive formats, and rely on p7zip for 7z support and Gna! unrar for RAR support. Due to the incompatibility of newer RAR3 files, though, you may still need to install the proprietary unrar on your system as well.

As I mentioned at the beginning, both 7z and RAR support splitting large files into smaller chunks. However, in my tests, neither File Roller nor Ark recognized that a directory full of sequentially numbered files myfile.7z.001, myfile.7z.002, myfile.7z.003, and so on constituted one 7z file split into bite-sized chunks.

Thus, to get at the data inside, I needed to rejoin the split files into one on the command line using the cat command. cat myfile.7z.001 myfile.7z.002 myfile.7z.003 > myfile.7z will glue the partial files back together in order, naming the result myfile.7z. At that point, you could open the 7z archive up in File Roller or Ark, but as long as you are already at the command line, it is much quicker just to type 7z x myfile.7z, and voilà -- data, beautiful data, right at your fingertips.

Click Here!