April 24, 2006

How to configure a low-cost load-balanced LAMP cluster

Author: Keith Winston

The ubiquitous Linux, Apache, MySQL, and PHP/Perl/Python (LAMP) combination powers many interactive Web sites and projects. It's not at all unusual for demand to exceed the capacity of a single LAMP-powered server over time. You can take load off by moving your database to a second server, but when demand exceeds a two-server solution, it's time to think cluster.

A LAMP cluster is not the Beowulf kind of cluster that uses specialized message-passing software to tackle a computation-intensive task. It does not cover high availability features, such as automatic failover. Rather, it is a load-sharing cluster that distributes Web requests among multiple Web and database servers while appearing to be a single server.

All the software required to implement a LAMP cluster ships with most Linux distributions, so it's easy to implement. We'll construct a cluster using seven computers for a fictitious company, foo.com. Two servers will run DNS, primary and backup, to distribute Web requests among three Web servers that read and write data from two MySQL database servers. You could build any number of different designs, with more or fewer of each kind of server, but this model will serve as a good illustration of what can be done.

Load balancing

The first part of the cluster handles load balancing by using the round robin feature of the popular DNS software Berkeley Internet Name Daemon (BIND). Round robin DNS is a load balancing method of serving requests for a single hostname, such as www.foo.com, from multiple servers.

To use round robin, each Web server must have its own public IP address. Many organizations use network address translation and port forwarding at the firewall to assign each Web server a public IP address while internally using a private address. In my DNS example, I show private IP addresses, but public IPs are required for the Web servers so DNS can work its magic.

This snippet from the DNS zone definition for foo.com assigns the same name to each of the three Web servers, but uses different IP addresses:

;
; Domain database for foo.com
;
foo.com.                IN      SOA     ns1.foo.com. hostmaster.foo.com. (
                        2006032801 ; serial
                        10800 ; refresh
                        3600 ; retry
                        86400 ; expire
                        86400 ; default_ttl
                        )
;
; Name servers
;
foo.com.                IN      NS      ns1.foo.com.
foo.com.                IN      NS      ns2.foo.com.
;
; Web servers
; (private IPs are shown for illustration, but public IPs are required)
;
www                     IN  A  10.1.1.11
www                     IN  A  10.1.1.12
www                     IN  A  10.1.1.13

When the DNS server gets requests to resolve the name www.foo.com, it will return one IP address the first time, then a different address for the next request, and so on. Theoretically, each Web server will get one-third of the Web traffic. Due to DNS caching and because some requests may use more resources that others, the load will not be shared equally, but over time it should be close enough.

If round robin DNS is too crude, and you have some money to throw at the problem, a number of companies sell hardware load balancing equipment that offers better performance. Some even take into account the actual load on each Web server to maximize cluster performance instead of just delegating incoming requests evenly.

Web servers

Configuring the Web servers for use in a cluster is largely the same as configuring a single Apache Web server, with one exception. Content on all the Web servers has to be identical, in order to maintain the illusion that visitors are using one Web site and not three. That requires some mechanism to keep the content synchronized.

My tool of choice for this task is rsync. To keep things in sync with rsync, designate one server, web1 for example, as the primary Web server, and the other two as secondaries. Make content changes only on the primary Web server, and let rsync and cron update the others every minute -- or whatever interval you think is best, depending on how often content on the server is updated. Thanks to the advanced algorithms in rsync, content updates happen quickly.

I recommend creating a special user account on each Web server, called "syncer" or something similar. The syncer account needs to have write permissions to the Web content directory on each server. Then, generate a pair of secure shell (SSH) keys for the syncer account using ssh-keygen on the primary Web server and distribute the public keys to the /home/syncer/.ssh directory on the other two Web servers. This allows you to use rsync over SSH without needing a password for authentication to keep the content up-to-date at regular intervals.

This short shell script uses rsync to update the Web content:

#!/bin/bash
rsync -r -a -v -e "ssh -l syncer" --delete /var/www/ web2:/var/www/
rsync -r -a -v -e "ssh -l syncer" --delete /var/www/ web3:/var/www/

Set up the script in cron to run regularly and push updates out to web2 and web3.

The cookie conundrum and application design

Cookies can be a tricky issue when LAMP applications use this kind of cluster. By default, Apache stores its cookies in the /tmp directory on the server where it is running. If a visitor starts a session on one Web server, but subsequent HTTP requests are handled by a different Web server in the cluster, the cookie won't be there and things won't work as expected.

Because the IP address of a Web server is cached locally, this doesn't happen often, but it is something that must be accounted for, and may require some application programming changes. One solution to the cookie problem is to use a shared cookie directory for all Web servers. Be particularly aware of this issue when using pre-built LAMP applications.

Aside from the cookie issue, the only other requirement for an application is that all database writes are sent to the database master, while reads should be distributed between the master and slave(s). In our example cluster, I would configure the master Web server to read from the master database server, while other two Web servers would read from the slave database server. All Web servers write to the master database server.

Database servers

MySQL has a replication feature to keep databases on different servers synchronized. It uses what is known as log replay, meaning that a transaction log is created on the master server which is then read by a slave server and applied to the database. As with the Web servers, we designate one database server as the master -- call it db1 to match the naming convention we used earlier -- and the other one, db2, is the slave.

To set up the master, first create a replication account -- a user ID defined in MySQL, not a system account, that is used by the slaves to authenticate to the master in order to read the logs. For simplicity, I'll create a MySQL user called "copy" with a password of "copypass." You will need a better password for a production system. This MySQL command creates the copy user and gives it the necessary privileges:

GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO copy@"10.1.0.0/255.255.0.0" IDENTIFIED BY 'copypass';

Next, edit the MySQL configuration file, /etc/my.cnf, and add these entries in the [mysqld] section:

# Replication Master Server (default)
# binary logging is required for replication
log-bin

# required unique id
server-id = 1

The log-bin entry enables the binary log file required for replication, and the server-id of 1 identifies this server as the master. After editing the file, restart MySQL. You should see the new binary log file in the MySQL directory with the default name of $HOSTNAME-bin.001. MySQL will create new log files as needed.

To set up the slave, edit its /etc/my.cnf file and add these entries in the [mysqld] section:

# required unique id
server-id = 2
#
# The replication master for this slave - required
# (replace with the actual IP of the master database server)
master-host =   10.1.1.21
#
# The username the slave will use for authentication when connecting
# to the master - required
master-user     =   copy

# The password the slave will authenticate with when connecting to
# the master - required
master-password =   copypass

# How often to retry lost connections to the master
master-connect-retry = 15

# binary logging - not required for slaves, but recommended
log-bin

While it's not required, it is good planning to create the MySQL replication user (copy in our example) on each slave in case it needs to take over from the master in an emergency.

Restart MySQL on the slave and it will attempt to connect to the master and begin replicating transactions. When replication is started for the first time (even unsuccessfully), the slave will create a master.info file with all the replication settings in the default database directory, usually /var/lib/mysql.

To recap the database configuration steps,

  1. Create a MySQL replication user on the master and, optionally, on the slave.
  2. Grant privileges to the replication user.
  3. Edit /etc/my.cnf on master and restart MySQL.
  4. Edit /etc/my.cnf on the slave(s) and restart MySQL.

How to tell if replication is working

On the master, log in to the MySQL monitor and run show master status:

mysql> show master status \G;
*************************** 1. row ***************************
            File: master-bin.006
        Position: 73
    Binlog_do_db:
Binlog_ignore_db:
1 row in set (0.00 sec)

On the slave, log in to the MySQL monitor and run show slave status:

mysql> show slave status \G;
*************************** 1. row ***************************
         Master_Host: master.foo.com
         Master_User: copy
         Master_Port: 3306
       Connect_retry: 15
     Master_Log_File: intranet-bin.006
               [snip]
    Slave_IO_Running: Yes
 Slave_MySQL_Running: Yes

The most important fields are Slave_IO_Running and Slave_MySQL_Running. They should both have values of Yes. Of course, the real test is the execute a write query to a database on the master and see if the results appear on the slave. When replication is working, slave updates usually appear within milliseconds.

Recovering from a database error

If the slave database server loses power or the network connection, it will no longer be able to stay synchronized with the master. If the outage is short, replication should pick up where it left off. However, if a serious error occurs on the slave, the safest way to get replication working again is to:

  1. Stop MySQL on the master and slave.
  2. Dump the master database.
  3. Reload the database on the slave.
  4. Start MySQL on the master.
  5. Start MySQL on the slave.

Depending on the nature of the problem, a full reload on the slave may not be necessary, but this procedure should always work.

If the problem is with the master database server and it will be down for a while, you can reconfigure the slave as the master by updating its IP address and /etc/my.cnf file. All Web servers then must be changed to read from the new master. When the old master is repaired, it can be brought up as a slave server and the Web servers changed to read from the slave again.

MySQL 5 introduced a special storage engine designed for distributed databases called NDB that provides another option. For more in-depth information on MySQL clustering, see the MySQL Web site or High Performance MySQL by Jeremy Zawodny and Derek Balling.

Going large

Clusters make it possible to scale a Web application to handle a tremendous number of requests. As traffic builds, network bandwidth also becomes an issue. Top-tier hosting providers can supply the redundancy and bandwidth required for scaling.