Author: Joe Barr
Let’s say you want to get a mission-critical app from Ibiblio — not an uncommon situation for many. But when you try to FTP the download file — shutbox-0.3.tar.gz — there is no room at the inn. So instead of using FTP, use wget. The command to use to do so would be:
By default, wget will try 20 times to get the desired file. If that’s not enough, you can specify the number of times to try by inserting the
-t option followed by a larger number, like this:
wget -t 30 ftp.ibiblio.org/pub/linux/games/shutbox-0.3.tar.gz
Then just walk away and forget it, because wget doesn’t need a babysitter.
Of course, FTP is just half the story. You can grab Web content just as easily. In fact, using the recursive option (that’s
-r) with the command will let your build a replica of the target site on your own system. PLEASE NOTE: This can have an undesirable impact on both the target and your own system if not used properly, so be careful what you wish for and what you wget.
Let’s say you’re about to leave work but want to read another article here at Linux.com. Your commuter train doesn’t provide Internet access. Call on wget! Download the story now, read it later. Enter this:
Opening the file wget downloads in response to that command gives you a readable version of the story, but it really doesn’t look the same as it does online. How to fix that? Let’s add
-pk to the command:
wget -pk http://www.linux.com/article.pl?sid=04/12/13/1954228
Those two options make all the difference. The
k option converts the links as they are written to your machine to make them suitable for local viewing instead of reaching out to the Internet to pull bits and pieces in when you need them. The
p tells wget to get everything required to display the page. Now the story looks the same on the train ride as it did during your Coke break. Pretty neat.
A more common usage of wget’s HTTP side is to create a mirror image of a remote site. It’s a great way to back up Web sites you may have hosted elsewhere. I saw a tip on OpenDeveloper.org about how to do exactly that. I used it like this the first time:
wget -rpk --level=0 http://www.joebarr.org
But because the site is using PHP to deliver the pages, Mozilla couldn’t parse the pages linked to from the front page. After adding the
E option — which appends “.html” to CGI or PHP-generated pages — all the pages worked just fine.
So fine, in fact, that when I tried to view some of the other pages offline, I got “Unauthorized access” messages. But don’t worry, you can provide user and password information using the
--http-passwd options. Of course, doing so over an insecure network is not good security practice.
Wget is a very cool tool, and there are many ways to user it productively. As always, ask the man for more information.