July 1, 2004

A parent's guide to Linux Web filtering

Author: Joe Bolin

Having converted quite a few people to the world of GNU/Linux, I am
often asked by parents, "Can I set up parental Web filters for my
children using Linux?" The answer is yes, and here's how.

A Web filter is a software that can filter the type of content a Web browser displays. The filter checks the content of a Web page against a set of rules and replaces any unwanted content with an alternative Web page, usually an "Access Denied" page. The type of content to be filtered is usually controlled by a systems administrator or a parent. Web filters are used in schools, libraries, and homes to
safeguard children from obscene content on the Internet.

Before you begin, you should be familiar with some basic networking
concepts:

  • A server, as
    in "Web server," is nothing more than an application that runs on a
    computer and listens for incoming requests. It sends back, or serves,
    information to the source that requested the information. This
    information can be anything from Web pages to databases. Each server
    communicates through the use of an IP address and a port number.
  • Ports
    are logical addresses that applications on a computer use in a way
    similar to how we use phone numbers. Each server program must have a
    unique port that it uses for communications.
  • Every computer connected to the Internet has both an external IP
    (Internet Protocol) address, usually assigned by an Internet service
    provider, and an internal address of 127.0.0.1. The internal address
    allows the computer to "listen" and "talk" to itself and is referred
    to as the loopback
    address
    . Normally a server is set up to accept requests from other
    computers on the Internet by listening on its external address. Since
    this can present a security risk for our single computer, we will use
    the loopback address instead. This will cause our server to only listen
    for requests from the computer that the server resides on.
  • A firewall
    is an application that controls the types of communication your
    computer can send and receive. GNU/Linux has an excellent firewall
    called netfilter/iptables, or simply iptables, built right into the
    kernel, which we will make use of to
    redirect users' Web surfing through our Web filter.

Getting the software

The only software you need to set up parental filters under
GNU/Linux is iptables, DansGuardian, and Squid.

DansGuardian is the actual filtering software. It supports phrase
matching, which allow you to block out Web sites that contain certain
phrases or words; PICS filtering, which blocks
content that's been labeled as possibly objectionable material by the
creator of the Web site; URL filtering, to block content from specific sites that are known to contain offensive material; and blacklists, or lists of sites that
contain content you want to block. Blacklists usually come from third parties, though you can create and maintain your own.

Squid is a Web proxy server that acts as a middleman between your
computer and the Internet. You need a proxy server because DansGuardian isn't able to fetch Web pages by itself. We'll configure Squid
as a transparent proxy, meaning we'll hijack network traffic and
redirect it to a new destination -- our filter program, in this case --
without the need for the user to know that it is happening.

Most modern distribution have packaged versions of Squid and DansGuardian available. If yours doesn't then you will need to install them
from source code. Both the Squid and DansGuardian Web sites have
complete instructions for how to compile and install the programs from
source.

Iptables is the firewall management tool used with the 2.4.x and
higher kernels. Most modern distributions provide iptables. If yours
doesn't, you will need to compile a new kernel and enable iptables,
which is beyond the scope of this article (and probably beyond the
abilities of most parents). You'd probably be better off upgrading to a
newer Linux distribution.

Configuring Squid

The default location for the Squid configuration file on most
systems is /etc/squid/squid.conf. While most of the default settings
for Squid are all right for our usage, you will need to edit the
configuration file just a bit.

You will need to become the root user in order
to make the changes and issue the commands shown in this article. You
can do this by either logging in as root or with the su command.

Add or edit the following line to have Squid listen only on the
loopback device on port 3128. This will cause Squid to act only as a
proxy server for this computer and assigns it a specific port number to
listen on:

http_port 127.0.0.1:3128

To configure Squid as a transparent proxy, add the following lines
to squid.conf:

httpd_accel_host virtual
httpd_accel_port 80
httpd_accel_with_proxy on
httpd_accel_uses_host_header on

Your system should have created a user and a group named squid
when you installed Squid. If it didn't, you should create them yourself
by using the following two commands from the command line:

groupadd -r squid
useradd -g squid -d /var/spool/squid -s /bin/false -r squid

Since Squid is normally started by the system and run as root, you
need to add the next two lines to /etc/squid/squid.conf in order to
make Squid run with squid's user and group IDs:

cache_effective_user squid
cache_effective_group squid

We will later use this to identify Squid to our firewall. Then we
will allow the user squid to access the Internet while we redirect all
other Web traffic through our filter.

Configuring DansGuardian

Our next step is to configure DansGuardian. The default location,
on most systems, for the configuration files is
/etc/dansguardian/dansguardian.conf. Once again, most of the default
values are fine, but we need to make a few changes.

First, add or edit the following line to make the filter use HTML templates, which are static Web pages that our filter will use to display the
"Access Denied" page instead of the inappropriate sites. Using HTML
templates keeps us from having to set up a Web server to display the "Access Denied"
information.

reportinglevel = 3

Next, add or edit the following lines to make DansGuardian listen
on the loopback address and port 8080:

filterip = 127.0.0.1
filterport = 8080

Add or edit the following line to tell DansGuardian which address and port that Squid is listening on. This enables our filter to fetch the requested Web content through the proxy.

proxyip = 127.0.0.1
proxyport = 3128

Again, to keep your filter from running as root you need to change
the user that it will run as. For simplicity, we will reuse the user and
group that we previously set up for Squid. Add or edit the following to
make DansGuardian run with UID and GID of squid:

daemonuser = 'squid'
daemongroup = 'squid'

While DansGuardian provides an excellent filter all by itself, you
may want to exercise further control over the Web filtering by editing
the other files in the /etc/dansguardian directory that contain
external blacklists. Blacklists from squidGuard and URLBlacklist work perfectly with
DansGuardian. Each file contains a brief explanation for its contents
to make configuration easier.

Putting it in action

Once you have Squid and DansGuardian set up, the final step is to
implement a transparent proxy using iptables. Use the following
commands at the command line to add rules to the firewall to allow the
user squid to access both the Internet and the Squid proxy we set up.

iptables -t nat -A OUTPUT -p tcp
--dport 80 -m owner --uid-owner squid -j ACCEPT

iptables -t nat -A OUTPUT -p tcp
--dport 3128 -m owner --uid-owner squid -j ACCEPT

If you want a user to be exempt from filtering -- a parent, for
example -- issue the following command. Replace EXEMPT_USER with the
username that you wish to exempt from filtering:

iptables -t nat -A OUTPUT -p tcp
--dport 80 -m owner --uid-owner EXEMPT_USER -j ACCEPT

The next command redirects Internet traffic from all users, other than squid and any exempt users, to the filter on port 8080:

iptables -t nat -A OUTPUT -p tcp
--dport 80 -j REDIRECT --to-ports 8080

Since we have a proxy server set up, a user could configure a Web
browser to bypass the filter and access the proxy directly. The Squid proxy is
listening for requests from the computer, and it doesn't care which
user sends the request. We could set up our firewall to deny all access
to the proxy except from our filter, but let's be a little sneakier.
Let's set it up so that direct requests to the Squid proxy server, except from our
filter, get redirected through the filter. To do this, use the following
command:

iptables -t nat -A OUTPUT -p tcp
--dport 3128 -j REDIRECT --to-ports 8080

Some systems, such as MandrakeLinux, utilize an application called Shorewall to manage
firewall rules. For these systems, place the above firewall rules in
/etc/shorewall/start, to use the filtering when Shorewall starts, and
in /etc/shorewall/stop, to make them stick if you should stop Shorewall
for some reason. To implement the new rules simply restart Shorewall
using the following command:

service shorewall restart

For systems using Shorewall, your firewall rules are set. For all other systems,
you'll need to perform the next two steps in order to get the new firewall rules started at boot time. Issue the following command to save your firewall rules:

iptables-save > /etc/sysconfig/iptables

Now issue the following to make sure iptables is started at boot
time and to start the iptables firewall:

chkconfig iptables on
service iptables restart

You may also need to make sure that DansGuardian and Squid get
started at boot by using the following two commands:

chkconfig squid on
chkconfig dansguardian on

To get the filtering started, you can now enter the following commands:

service squid restart
service dansguardian restart

The "Access Denied" screen - click to enlarge

Now when users enter a forbidden Web address they will be presented with
an "Access Denied" page instead of the offending site. You can customize the look of the "Access Denied" page by editing the template.html file in the appropriate language section located in /etc/dansguardian/languages.

Final thoughts

While the setup discussed in this article is intended for use on a
single computer, this method of Web filtering can be applied to a wide
range of scenarios. These tools
can be easily and successfully implemented on a small home network, a
large business infrastructure, or any environment that needs to comply
with the Children's
Internet Protection Act
.

Bear in mind that Web filtering software of any kind is not 100%
failsafe, nor is it a substitute for parental supervision. Along with
installing filtering software, educate yourself and your children about
the Internet.

Click Here!