How to save time and traffic upgrading with apt-proxy

161

Author: Nathan Willis

June is Bandwidth Conservation Month (well, not officially, but let’s say that it is), so if you have multiple machines running an APT-powered Linux distribution such as Debian or Ubuntu, you should take a look at apt-proxy, a utility that caches package downloads in a shared pool for all interested parties on your LAN. This saves you both the time and the bandwidth it costs to download the same updates for more than one computer.

I was blissfully unaware of apt-proxy myself last week when I was upgrading three Ubuntu machines from 7.10 to 8.04, but the idea of caching the downloaded packages seemed so obvious that I was sure such a utility existed. After familiarizing myself with it I decided to test it out on the next batch of package updates that came down the wire.

Nothing about apt-proxy is Ubuntu-specific; it works equally well for any APT repository. Nevertheless, if you are trying it for the first time, it would be wise to look for distro-specific instructions or success stories from other users. And if you don’t get apt-proxy to work, you can always roll back the changes and update your system the way you used to.

An apt-proxy setup involves running the apt-proxy service itself on one machine that in turn acts as an APT repository for the others. For those others, no new software need be installed — adding the apt-proxy server to their respective /etc/apt/sources.list is the only configuration required.

In sources.list, the entry for a typical repository takes the form deb http://some.server.org/and/optional/path distribution component1 component2 ... componentN — the distribution element (e.g. gutsy) distinguishes between multiple distros or distro releases on the same server, and the component elements refer to discrete collections of packages (such as main, nonfree, or security-updates) available for the specified distro. Luckily, those are all server-side APT issues, and apt-proxy does not have to worry about them.

But apt-proxy will have to find all of the requested packages, so you will have to jot down all of the URLs that make up the second element in each entry so that you can add them to apt-proxy’s config file. If all of your computers run the same version of the same distro, it will be a short list, but make note of the repositories you need before you begin.

To get started, install the apt-proxy package on the computer you intend to use as a server with sudo apt-get install apt-proxy. Once it is installed, edit the file /etc/apt-proxy/apt-proxy-v2.conf. The default configuration is likely to work fine for most users, but familiarize yourself with the details at the start of the file just to be on the safe side.

The port setting (default 9999) is important; it is the TCP port on which apt-proxy will listen for connections. If you run a firewall, make sure this port is open and no other services are running on it. Likewise, cache_dir is set to /var/cache/apt-proxy, so you will want to ensure that that directory has sufficient space, particularly if you are upgrading an entire distro.

Configuring your repositories

The latter half of the configuration file lists the APT servers that apt-proxy itself will connect to. This is the portion of the file you will definitely want to edit.

Each APT repository accessed by any of your client machines is recorded here as what the in-line comments call a “backend server.” Each backend server gets its own section, beginning with a name enclosed in [brackets]. You get to choose the name, so choosing a short and descriptive one can help if you use a lot of repositories.

You can list multiple alternate URLs in each backend server section to allow failover in case the main repository is unreachable. Don’t confuse the use of multiple URLs for each backend server with the fact that you can have multiple backend server sections. The alternate URLs within one bracketed backend server section must all be sources for the same set of packages. Apt-proxy tries the first URL in each section first, and looks at the others only if the first fails.

For example, if you use the main Debian repository and Google’s APT repository, you will need a [debian] section and a [google] section. The [debian] section can list both the main URL (i.e., http://ftp.us.debian.org/debian) and an alternate (say, ftp://ftp.uk.debian.org/debian).

Notice also that you only list the URL for each repository, and not the distribution or components that are also included in a sources.list entry. Apt-proxy does not care about them; it will fetch whatever the apt-get clients request on a case-by-case basis.

Finally, start apt-proxy on the server machine with sudo /etc/init.d/apt-proxy start.

Configuring your clients

On every machine that you want to send through apt-proxy, edit /etc/apt/sources.list. For each repository entry, replace the original server’s URL with the address of your apt-proxy server (including the port number on which it runs), appended with the name of the corresponding backend server entry you previously set up. Leave the distribution, components, and the “deb” unchanged.

For example, assume that your apt-proxy server is running on port 9999 on 192.168.1.101. If the client’s sources.list contained the line deb http://dl.google.com/linux/deb/ stable non-free for Google’s repository, you would substitute deb http://192.168.1.101:9999/google stable non-free.

Over on the apt-proxy server, the apt-proxy-v2.conf file should have a backend server section labeled [google] that contains the original URL, http://dl.google.com/linux/deb/. When the client requests a package, the apt-proxy server intercepts the request, looks to see if has already cached the package, and if it has not, sends the request through to the upstream APT repository.

When you update the first client machine, apt-proxy will have to fetch its packages from the Internet as usual, so it won’t result in any speed-up. But for every subsequent machine making the same update, apt-proxy will serve up the requested packages at LAN speed — sparing you time and bandwidth usage.

The only question you’ll have will be “why didn’t I do this sooner?” But you’ll spend so little time doing your updates that you’ll hardly have a chance to mull it over.

Category:

  • System Administration