Linux.com

Feature

File compression tools for Linux

By Shashank Sharma on September 09, 2005 (8:00:00 AM)

Share    Print    Comments   

Among the confusion new Linux users often face is the variability among archived and compressed formats used by downloaded applications. "Should I use the tar.gz file, the zip file, or the tar.bz2 file?" they may wonder. Here's what you need to know about compression formats in order to easily install any application.

First, consider the distinction between archiving and compression. Archiving means combining a number of files together into one file. The idea is to achieve easier storage and transportation. It's like having a briefcase in which to keep all your files. The archive must contain some information about the original files, such as their names and lengths, for proper reconstruction. This ensures that your paperwork will remain as-is when you open your briefcase. Some popular archive file formats are tar and zip.

Compression, on the other hand, is the process of using encoding schemes to store information in fewer bits than traditional representation would use. It's similar to the difference between shorthand writing and normal writing, where the former requires less paper than the latter. Common compression formats are zip, gz, and bz2.

Working with archives

The tar format is the most common archive format on *nix systems. Tar was originally designed for transferring files to and from tape drives -- its name is short for "tape archive." A tar archive is commonly called a tarball.

To archive files, use a command like tar -cf archive.tar file1 file2 file3. This command combines file1, file2, and file3 and stores them in archive.tar. The -c switch tells tar that you want to create an archive. The -f switch indicates that we are working on files.

The command tar -xf archive.tar extracts all the files from archive.tar and stores them in the current directory with their original names.

Compressing files

In *nix land, bz and gz are two of the most common compression formats. Typically you use the bzip2 utility to create bz files and gzip to create gz. The fundamental difference is in the compression algorithm used by bzip2, which results in considerably smaller files. The downside is that bzip2 eats up more memory.

To compress a file using gzip, use the command gzip filename . The result is a file named filename.gz. Thus the command gzip homepage.htm yields homepage.htm.gz.

One thing to remember about gzip is that it replaces the original file with one which has .gz extension.

To uncompress files, use either gzip -d or gunzip.

bzip2 is similar to gzip. As with gzip, bzip2 also overwrites the original file with one which has a .bz or .bz2 extension. Decompressing .bz files is a breeze -- use bzip2 -d or bunzip2.

Both gzip and bzip2 maintain the ownership and permissions of the original file when compressing.

You can also use the zip utility to compress files, if you wish to share files with friends who use a non-*nix platform. zip files.zip file1 file2 file3 would compress the three files, display the rate of compression of each file, and store them in files.zip. The unzip program can be used to extract the contents of a zip file.

Compressed archives

Unlike zip, which offers compression and archiving functionality, tar is capable of archiving only. This means that after you create a tarball, its size is the same as the cumulative size of the individual files. To reduce the size of a tarball, you must compress it by using either gzip or bzip2:

tar -cf archived.tar file1 file2 file3
gzip archived.tar

This compresses archived.tar and replaces it with archived.tar.gz. You could also use bzip2 instead of gzip.

How do you extracting files from a compressed tarball? Use tar zxvf archived.tar.gz to extract all the files from a gzip-compressed tarball. The z switch tells tar that the tarball was compressed using gzip.

If you used bzip2 to compress this tarball, you'd get an error message if you used tar's z switch. To decompress a bzip2-compressed tarball, you need to use the j switch in its place. tar jxvf archived.tar.bz would extract the files.

You may encounter compressed tarballs in other formats, such as tgz and tbz2. These are short for tar.gz and tar.bz2, respectively.

Though it may seem complex at first, that's all there is to archiving and compression under *nix. From here on, no matter what compression format you encounter, you'll know what you need to do.

Shashank Sharma is studying for a degree in computer science. He specializes in writing about free and open source software for new users.

Shashank Sharma specializes in writing about free and open source software for new users and moderates the Linux.com forum boards. He is the coauthor of Beginning Fedora, published by Apress.

Share    Print    Comments   

Comments

on File compression tools for Linux

Note: Comments are owned by the poster. We are not responsible for their content.

the tar.bz2...

Posted by: Anonymous Coward on September 10, 2005 08:19 PM
...obviously

#

Re:the tar.bz2...

Posted by: Anonymous Coward on September 11, 2005 09:32 AM
the<nobr> <wbr></nobr>.bz2 is smaller than a<nobr> <wbr></nobr>.gz, but is also very much slower to decompress.

Good old gzip still gives decent compression and fast (de)compression speed.

#

Re:the tar.bz2...

Posted by: Anonymous Coward on September 11, 2005 05:50 PM
Slower to decompress, but faster to download, wich (for me) is much more important.

#

zip

Posted by: Anonymous Coward on September 12, 2005 10:29 PM
is no good for a large number of files in an archive. It has a limit of either a signed or unsigned 16 bit value (can't recall which).

#

Strange

Posted by: Anonymous Coward on September 10, 2005 03:29 AM
Why suggest that things be done in two steps?

tar xvfj - uncompress and extract in one step

tar cvfj - archive and compress in one step

Easy.

#

Re:Strange

Posted by: Anonymous Coward on September 10, 2005 04:08 AM
Keep in mind, however, that the one step as indicated above only works for the GNU version of tar.

#

GNU tar rocks.

Posted by: Anonymous Coward on September 10, 2005 10:21 AM
The new versions of GNU tar can tell the encryption type & the z/j flags will therefore be optional!

#

Re:Strange

Posted by: Anonymous Coward on September 10, 2005 11:41 AM

The most general way would be to use a pipe. For instance, to uncompress a<nobr> <wbr></nobr>.tar.bz2 archive:

<tt>bunzip2 -c archive.tar.bz2 | tar xf -</tt>


The <tt>bunzip2 -c</tt> command will output the uncompressed<nobr> <wbr></nobr>.tar data to the <tt>tar</tt> program, which will then extract it. One advantage of this method is that you can use a compression format other than gzip or bzip2.

I prefer the lzma compression (7-zip), which provides significant disk space savings over bzip2, though the compression speed is very slow (de-compression speed is not bad, though). You can create a<nobr> <wbr></nobr>.tar.7z archive like this:

<tt>tar cf - archive | lzma e archive.tar.7z -si</tt>


And decompress it with

<tt>lzma d archive.tar.7z -so | tar xf -</tt>


You could even create<nobr> <wbr></nobr>.tar.zip archives or<nobr> <wbr></nobr>.tar.rar archives. In this way, tar is very general; but of course, if you plan to distribute the archives, you should use<nobr> <wbr></nobr>.tar.gz or<nobr> <wbr></nobr>.tar.bz2, as those are the most common right now.

#

Re:Strange

Posted by: Anonymous Coward on September 13, 2005 01:10 AM
I still use the old way (tar cf - | gzip -c >file.tgz) for one reason-- to get specific info on the encryption ratio. If you use "tar cvzf" you won't get info on how well the archive compressed, whereas if you pipe to "gzip -c" or "bzip2 -v", you can get compression information.

#

Re:Strange

Posted by: Administrator on September 10, 2005 06:30 AM
The key in this command is the "j" flag. It turns on bzip2 compression, resulting in a<nobr> <wbr></nobr>.bz2 file. Replace "j" with "z" and you'll get gzip compression, resulting in a<nobr> <wbr></nobr>.gz file.

#

Re(1):Strange

Posted by: Anonymous [ip: 72.159.58.197] on August 31, 2007 08:42 PM
Do one thing, do it well.

#

Bzip2 is MUCH slower than gzip

Posted by: Anonymous Coward on September 10, 2005 06:55 PM
It is not about memory... is about compression speed

#

Weakness compared to rar

Posted by: Anonymous Coward on September 10, 2005 08:47 PM
Unfortunately bz2 and zip share the same weakness: they don't have any recovery record. After compressing big tarball (several GB) you could face severe problems when trying to extract a single file. I know of a GPL version of unrar, but is there any GPL rar command which includes a recovery record ?

#

7z

Posted by: Anonymous Coward on September 11, 2005 07:51 AM
There is a<nobr> <wbr></nobr>.7z file format which provides rather good compression.

It is not as widely used but I assume its great for backups and such.

<a href="http://en.wikipedia.org/wiki/7z" title="wikipedia.org">http://en.wikipedia.org/wiki/7z</a wikipedia.org>
<a href="http://www.7-zip.org/7z.html" title="7-zip.org">http://www.7-zip.org/7z.html</a 7-zip.org>

#

Re:7z

Posted by: Anonymous Coward on September 13, 2005 02:08 AM
<a href="http://p7zip.sf.net/" title="sf.net">p7zip</a sf.net> is a port of 7za.exe for POSIX systems like Unix (Linux, Solaris, OpenBSD, FreeBSD, Cygwin,<nobr> <wbr></nobr>...), MacOS X and BeOS.

#

remarks

Posted by: Anonymous Coward on September 11, 2005 09:44 AM
>Typically you use the bzip2 utility to create bz
>files and gzip to create gz. The fundamental
>difference is in the compression algorithm used
>by bzip2, which results in considerably smaller
>files. The downside is that bzip2 eats up more
>memory.

Not only more memory, a lot of more time.

>Unlike zip, which offers compression and
>archiving functionality, tar is capable of
>archiving only. This means that after you create
>a tarball, its size is the same as the cumulative
> size of the individual files. To reduce the size
> of a tarball, you must compress it by using
>either gzip or bzip2:

>tar -cf archived.tar file1 file2 file3
>gzip archived.tar

For an explanation to newcomers, I think you correctly divide the tar and gzip steps, and avoid the unportable tar -zcf

>How do you extracting files from a compressed
>tarball? Use tar zxvf archived.tar.gz to extract
>all the files from a gzip-compressed tarball.

But why introduce -z here?
I would say

$ gunzip archived.tar.gz
$ tar -xf archived.tar

and then explain about the GNU-tar-only z and j options/shortcuts.

I also tell new users to use the v (verbose) switch, in order to get the feel of what is being added or extracted.

#

RAR is the best!

Posted by: Anonymous Coward on September 11, 2005 08:43 PM
the RAR format by eugene Rochel is still the best in terms of compression rate and speed.

#

Re:RAR is the best!

Posted by: Anonymous Coward on September 16, 2005 11:57 PM
i was just looking for a GPL version of a RAR utility yesterday. does anyone know if one exists? i know that rarlabs has their version--i would like to find one that is 100% free (as in freedon AND beer).

#

Re:RAR is the best!

Posted by: Anonymous Coward on April 23, 2006 01:18 PM
There's no such version, and AFAIK RAR compression algorithm is proprietary and the license specifically prohibits creation of code that performs RAR compression. As for the features, RAR is the best for two things: recovery and strong encryption. There's no particular sense in using it just to squeeze a few more kilobytes.

#

GUI please

Posted by: Anonymous Coward on September 11, 2005 11:50 PM
I am still looking for a GUI based archiver/compresser. I now use Konquerors built in explorer (right click) 'feature' which is barely o.k. I am looking for real right-click integration with lots of (cascaded) options like I have in Windows+PkZip or Windows+PowerArchiver or Windows+PkZip. As it is now: Linux sucks. Does anyone have any suggestions for me? Thanks.

#

Re:GUI please

Posted by: Anonymous Coward on September 12, 2005 12:12 AM
When you get over your fear of the command line, the GUI is seen exactly for what it is... a waste of resources.

#

Re:GUI please

Posted by: Anonymous Coward on September 12, 2005 05:33 AM
You don't like the right click menus? Create some you do like:
<a href="http://www.oreilly.com/catalog/linuxdeskhks/chapter/hack40.pdf" title="oreilly.com">http://www.oreilly.com/catalog/linuxdeskhks/chapt<nobr>e<wbr></nobr> r/hack40.pdf</a oreilly.com>

#

Re:GUI please

Posted by: Anonymous Coward on September 12, 2005 08:20 AM
Install ark (actually, it's part of KDE already) and it gives you right click menu on folders etc to create archives exactly like winzip.

What else do you need/want?

#

Re:GUI please

Posted by: Anonymous Coward on December 05, 2005 02:43 AM
What about a GUI for a NON-KDE user.

Many folks on older computers use Fluxbox, IceWm or XFce. There is not a good GUI archiver for these Window Managers to my knowledge.

Is there is one, please let me know!

Rob

#

Re:GUI please

Posted by: Anonymous Coward on November 07, 2006 06:03 AM
Unfortunately, ark in KDE-linux is no match for winrar or winzip in windows.

Ark does not support rar (does not compress),
No password locking archives,
No way to make split archives,
No way to add recovery record as a friend above mentions,
It only supplies basic functions of making zip or tar.gz files etc. Nothing else.

I am desperately seeking a really functional GUI archiving tool in linux, this is why I have read this thread.

#

rzip

Posted by: Anonymous Coward on September 12, 2005 09:18 AM
Try rzip some time. Big on memory and CPU time but if you want the best in archiving compression, this is probably it.

#

Re:rzip

Posted by: Anonymous Coward on September 13, 2005 09:33 PM
Yep, rzip is amazing when it comes to things like log files - the compression ratio is on a completely different scale to bzip2 even.

The biggest drawback imho is the fact that rzip cannot compress streams - you can't pipe to/from rzip, which means you can't use it for compressing directories using tar, unless you do it in 2 steps.

#

As the tumbleweeds blow past us

Posted by: Anonymous Coward on September 15, 2005 03:43 AM
Perhaps a better title for the above article would be "Common File Compression Tools for Linux" as it doesn't seem to cover more than a few common ones.

I don't understand why people are devoting their time to writing articles with information already explained and available with much greater detail in tons of different places on the web! Time is better spent pointing out new and original details and pointing others to existing documentation and tutorials, guides, what have you, which already exist on the web, AND WRITE SOMETHING ORIGINAL, FRESH, AND NEW INSTEAD!

That said, RAR kicks ass.<nobr> <wbr></nobr>;)

#

relief joint

Posted by: Anonymous Coward on May 28, 2006 07:06 PM
[URL=http://painrelief.fanspace.com/index.htm] Pain relief [/URL]

  [URL=http://lowerbackpain.0pi.com/backpain.htm] Back Pain [/URL]

  [URL=http://painreliefproduct.guildspace.com] Pain relief [/URL]
[URL=http://painreliefmedic.friendpages.com] Pain relief [/URL]
[URL=http://nervepainrelief.jeeran.com/painrelief<nobr>.<wbr></nobr> htm] Nerve pain relief [/URL]

#

hi

Posted by: Anonymous Coward on October 05, 2005 05:40 AM
Ummm. For some reason on my system (Slackware 10.2), gzip compresses better than bzip2. I just checked it. I archived a folder full of pictures and the filesize was a bit smaller when I used gzip than when I used bzip. Does this only happen if you archives something so small or is my system just weird? o.O;

#

Re:hi

Posted by: Anonymous Coward on November 28, 2005 11:17 PM
not sure, but i noticed a similar thing on my redhat system, except that in my case rar had worse compression. by almost 25% worse.

#

This story has been archived. Comments can no longer be posted.



 
Tableless layout Validate XHTML 1.0 Strict Validate CSS Powered by Xaraya