October 16, 2006

CLI Magic: Use cURL to measure Web site statistics

Author: Michael Stutz

cURL is a handy command-line network tool whose name stands for "client for URLs," but think of it as a "copy for URLs" -- it can copy to or from a given URL in any of nine different protocols.

Although cURL is sometimes misconceived as an updated wget, that's wrong. The two utilities do share some features and options, but are distinctly different tools; wget is for downloading files from the Web, and is best used to mirror entire sites or parts of sites -- which is something that cURL alone can't do.

cURL's job is to copy data to or from a given set of URLs; along with HTTP it recognizes the FTP, TFTP, GOPHER, TELNET, DICT, LDAP, FILE, HTTPS, and FTPS protocols. Other features include support for proxies, forms, cookies, SSL, client-side certificates, URL globbing, and very large files. Along with the curl command-line tool is a counterpart library, libcurl, that you can use to get cURL's functionality from within your own programs.

You can do a lot of neat tricks with curl. Here's a look at how you can copy to and from URLs, and then use cURL's reporting facilities to get simple Web server metrics from your operations.

Copy URLs

The curl tool supports the GNU-style --version option, which shows not only the version but also the protocols recognized, as well as any extra features that are compiled in. If you get stuck, --help gives a summary of options, while --manual gives an ASCII rendition of the entire man page plus a usage guide, in 40-odd pages. (Pipe these to less!) Also useful is the standard "verbose" option, -v, which tells you what the program is doing, step by step.

When you give curl a URL, its contents are retrieved and sent to standard output. To save the URL's output to a local file with the same name as the remote file, give the -O option; to specify a different name, use the -o option instead and give the name to use as an argument.

One of cURL's most useful features is a kind of URL "glob" support, which lets you specify a pattern as part of the URL to match multiple URLs. You can give a character range in brackets, such as [A-Z] or [0-9], and you can also give a list of alternatives in braces, such as {about,blog,news}. The only trick is that if you're saving to files with the -O option, you have to give that option as many times as files you match. For example, suppose you want to grab all three versions of a manual. You'd need a command like:

$ curl -O -O -O http://example.com/docs/manual.{html,pdf,tar.gz}

For HTTP requests, you can specify HTTP 1.1 byte ranges instead of entire files -- if the server has byte ranges enabled, this option returns only the specified bytes instead of the whole file. 0 represents the beginning of the file. For example, to grab the first 100 bytes:

$ curl -r 0-99 http://example.com/

Ranges don't have to begin with zero. To get bytes 100 through 200:

$ curl -r 100-200 http://example.com/

Negative values alone work from the end of the document. To grab the last eight bytes:

$ curl -r -8 http://example.com/

The -i option precedes a given URL by the server headers. Alternately, -I outputs only the headers, which is useful for seeing the OS and Web server software that a specified site is running. It also shows the date and time of the request, content length, and type of the given URL. When the -I option is used on a FILE or FTP URL, you'll get the file size and modification time.

You can upload files by specifying them as arguments to the -T option. It supports the same kind of globbing as the URL argument:

$ curl -T index-{01-99}.html ftp://ftp.example.com/pub/incoming/

By default, file uploads are given the same name as the source files, but you can specify a new name by including it in the target URL:

$ curl -T index-mine.html ftp://ftp.example.com/pub/incoming/index-yours.html

If you need to specify a username and password, give them as arguments to the -u option, separated by a colon. To upload standard input, use the hyphen as an argument:

$ some-long-pipeline | curl -u bob:secret -T - ftp://ftp.example.com/pub/bob/results.txt

Get server metrics

cURL supports built-in runtime variables that you can use to perform ad hoc diagnostics and benchmarking, or to gather statistics about the accessibility of a given URL, site, or server (all times are given in seconds, and all sizes are in bytes):

  • content_type: the Content-Type value of the file
  • http_code: HTTP(S) code in the page
  • http_connect: HTTP code in the proxy response
  • num_connects: number of new connections made in the transfer
  • num_redirects: number of redirection operations that were made
  • size_download: total size of downloaded data
  • size_header: total size of the headers
  • size_request: total size of the request
  • size_upload: total size of uploaded data
  • speed_download: average download speed
  • speed_upload: average upload speed
  • time_connect: time from the start until the remote host connection was made
  • time_namelookup: time from the start of the command until name resolution was finished
  • time_pretransfer: time from the start until the file transfer was about to begin
  • time_redirect: time for all redirection operations
  • time_starttransfer: all pretransfer time plus the time needed to calculate the result
  • time_total: time for the complete operation (to the millisecond)
  • url_effective: the last URL fetched

Output any of these variables with the -w option ("write-out"), giving the variables in the format %{name} as part of a quoted string. You can include any other text as part of that string, and do simple formatting by using \n for a newline or \t for a tab. For example:

$ curl -w '\nLookup time:\t%{time_namelookup}\nConnect time:\t%{time_connect}\nPreXfer time:\t%{time_pretransfer}\nStartXfer time:\t%{time_starttransfer}\n\nTotal time:\t%{time_total}\n' -o /dev/null -s http://linux.com/

Lookup time:    0.038
Connect time:   0.038
PreXfer time:   0.039
StartXfer time: 0.039

Total time:     0.039

To get the amount of time between when a connection is established and when the data actually begins to be transferred, subtract the value of time_pretransfer from time_starttransfer. You can automate this by sending the output to bc with echo:

$ echo "`curl -s -o /dev/null -w '%{time_starttransfer}-%{time_pretransfer}' http://linux.com/`"|bc

cURL offers other important options you'll want to use to check for timeouts or to control the transfer speed -- it has more than 100 options in total. By specifying huge URL ranges or calling curl from a loop, you can use the commands to do simple server load testing, or check for various failures by reading the variable output -- and since curl handles forms, you can even use it to test Web application speed.

Click Here!