Weekend Project: Set Up Squid on Linux as a Caching Web Proxy

1866

The Squid proxy server can function in many roles — HTTP accelerator, traffic filter, network logger, etc. — but its primary function is caching frequently-requested Web resources to save WAN bandwidth. A transparent caching proxy can intercept and cache HTTP traffic for your entire LAN, without the need to individually configure each browser. Because Squid has so many options, though, you need to set it up and test it before rolling it out in a production environment — making it a perfect weekend project.

The Bird’s Eye View

Before getting started, it is important to understand that Squid can cache HTTP traffic for the entire LAN, but that cannot transparently cache TLS/SSL connections, IMAP, XMPP, and many other types of content. In most cases, these are not connection types that you would want to cache for the entire LAN, but remembering it should also help you avoid the costly mistake of accidentally blocking this other traffic.

The exact setup required for your LAN will depend on its network topology. If you already have one machine serving as a gateway router (perhaps even DHCP server), this is the logical choice to configure as the Squid proxy. But if that machine is a low-power embedded router without disk space to use as a cache, you may need to select a different machine to be the proxy. In that case, you will need to configure Squid properly on the proxy machine, and configure the router to automatically redirect HTTP traffic from LAN clients to the proxy server.

Finally, it is highly suggested that you run as recent a version of Squid as possible. Because the package is provided by all major Linux distributions, you should not have difficulty getting or installing an up-to-date package, but check the version number to be sure. The latest release is the 3.1.x series; versions of Squid prior to 2.6 used a significantly different set of options.

Running the Proxy and Router on the Same Machine

The simpler configuration is found when Squid is running on the gateway machine; in this case all of the LAN clients are already directing their IP traffic to the machine because it is the default route to the rest of the Internet. The machine can inspect the packets’ contents, based on source and destination port.

Thus, to cache the LAN clients’ traffic, we can use Linux’s iptables packet filter to redirect HTTP traffic heard on the LAN network interface to Squid, and configure Squid to cache and proxy those requests on the WAN network interface. If eth0 is the LAN interface and eth1 is the WAN interface, the following iptables rules will perform the port redirection:

iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 80 -j REDIRECT --to-port 3128
iptables -A INPUT -j ACCEPT -m state --state NEW,ESTABLISHED,RELATED -i eth0 -p tcp --dport 3128
iptables -A OUTPUT -j ACCEPT -m state --state NEW,ESTABLISHED,RELATED -o eth1 -p tcp --dport 80
iptables -A INPUT -j ACCEPT -m state --state ESTABLISHED,RELATED -i eth1 -p tcp --sport 80
iptables -A OUTPUT -j ACCEPT -m state --state ESTABLISHED,RELATED -o eth0 -p tcp --sport 80

The first line forwards LAN traffic on TCP port 80 to Squid’s default listening port, 3128. The second line accepts incoming traffic designated for port 3128, while the third allows outgoing connections on the WAN interface destined for port 80 (i.e., headed to the remote Web server). Line four accepts incoming WAN traffic from port 80, and last but not least, line five allows those connections to be delivered on the LAN interface.

The fun thing about iptables is that there is frequently more than one way to accomplish the same thing. The rules listed above are fairly simple; however, you may need additional configuration. If you have a second network adapter on the LAN side (such as “wlan0” for a WiFi card), duplicating the eth1 rules for wlan0 is straightforward. But if you have multiple LAN interfaces (as would be expected on a router with a built-in Ethernet switch), filtering incoming packets by the source IP address is a better idea. You could do that with iptables -t nat -A PREROUTING -s 192.168.1.0/24 -p tcp --dport 80 -j REDIRECT --to-port 3128 to filter all clients in the 192.168.1.x block.

To configure Squid itself, open up its configuration file /etc/squid/squid.conf. First, you need to set the visible_hostname parameter to your machine’s hostname; without this Squid will not start. Next, look at the http_access line; it may be set to deny all. You can change it to allow all for testing purposes (or if you simply trust everyone who will ever be on your network), but for better security you should consult the Squid documentation on setting up more fine-grained access control.

Lastly, it is time to enable transparent caching. Near the top of the squid.conf file, you will see an http_port line, which as mentioned above is set to port 3128 by default. To turn on transparent caching, all you need to do is add the word intercept to this line. Then, start or restart Squid from the command line with service squid restart.

This should give you a working cache, however you should also read the squid.conf in-line documentation to get a feel for how some of the other options can affect performance. In particular, Squid uses the /var/spool/squid/ directory by default for cached object storage. The less room you have available in /var/, the worse your performance will be, as objects expire sooner from the cache. Similarly, you want to choose a sane value for maximum_object_size and maximum_object_size_in_memory.

Running the Proxy and Router on Different Machines

If the Squid service is not running on the gateway machine (and thus, on the default route for IP traffic), then the setup becomes more complicated on the routing side. You should set up the Squid server itself as before; using the intercept flag in squid.conf.

On your LAN’s default router, however, you will need to use a different set of iptables rules. The plan is to rewrite LAN traffic headed for a TCP 80 destination port to the Squid server, unless that traffic originates from the Squid server itself. That way Squid can still make connections to the outside world.

As was the case before, there are several methods to do this with Linux’s packet-filtering functionality. An approach similar to the example shown above (using the “nat” table) might look like this, on the router:


iptables -t nat -A PREROUTING -i eth0 -s ! 192.168.1.101 -p tcp --dport 80 -j DNAT --to 192.168.1.101:3128
iptables -t nat -A POSTROUTING -o eth0 -s 192.168.1.0/24 -d 192.168.1.100 -j SNAT --to 192.168.1.1
iptables -A FORWARD -s 192.168.1.0/24 -d 192.168.1.101 -i eth0 -o eth0 -m state --state NEW,ESTABLISHED,RELATED -p tcp --dport 3128 -j ACCEPT
iptables -A FORWARD -d 192.168.1.0/24 -s 192.168.1.101 -i eth0 -o eth0 -m state --state ESTABLISHED,RELATED -p tcp --sport 3128 -j ACCEPT

Here, eth0 is again the LAN interface and eth1 the WAN interface. Squid is running on 192.168.1.101, while 192.168.1.1 is the LAN’s gateway router. Line one redirects HTTP traffic not (“!”) from the Squid server to the Squid server at its default port 3128. Line two takes the redirected traffic on its way out and performs source network address translation, so that the packets appear to have originated on the router itself. Lines three and four simply tell the router to permit traffic to pass unaltered between the Squid server and the LAN clients.

A very different approach is suggested at the official Squid project wiki, utilizing Linux’s iproute2 utility. In this method, you define a special route for your proxied traffic in /etc/iproute2/rt_tables, such as 222 squidtraffic, then define the route with ip rule add fwmark 2 table squidtraffic; ip route add default via 192.168.1.101 table squidtraffic.

You then set up an iptables rule that “marks” LAN HTTP traffic with firewall mark #2 (selected in the ip rule command; the number itself is an arbitrary choice):


iptables -t mangle -A PREROUTING -p tcp --dport 80 -s 192.168.1.101 -j ACCEPT
iptables -t mangle -A PREROUTING -i eth0 -p tcp --dport 80 -j MARK --set-mark 2
iptables -t mangle -A PREROUTING -m mark --mark 2 -j ACCEPT

An advantage of this approach is that it does not rely on Network Address Translation, so if you do not use NAT, you do not have to start. On the other hand, it does split the configuration up into multiple pieces, which can be confusing when you revisit the configuration eight months later, having forgotten the details.

Making it Persistent, and a Little Extra Credit

Whichever approach you take, in order to make your changes survive a reboot you will need to save your iptables (and/or ip) commands in a script that gets executed once the network adapters are brought up by the boot process. How best to do that varies a little by distribution. Ubuntu and Fedora, for example, use Upstart, so you can save your script with a meaningful name in /etc/init/, although you must configured it according to Upstart’s specific rules. OpenSUSE and Debian users can use a more traditional shell script format and save it in /etc/init.d/.

For an entirely different approach to the caching proxy problem, consider looking at the Web Cache Communication Protocol (WCCP), which is also supported by Squid. WCCP was developed by Cisco for use in large deployments (with clusters of caches and routers), but for complex networks, that could be easier to maintain that multiple independent routers each with a different set of iptables rules.

Finally, if you are already running Squid successfully with a different packet filtering setup, consider sharing with other readers here. A lot of the rules depend heavily on network topology and the version of Squid being used, so who knows — your example may help out someone working on supporting a similar configuration. Thus saving them the bandwidth required to hunt it down on their own.