August 23, 2006

CAPTCHA your blog comments with FOSS utilities

Author: Donald W. McArthur

Bloggers hate automated comment spammers. One way to foil these vermin is a system called CAPTCHA, which is described in Wikipedia as "an acronym for 'completely automated public Turing test to tell computers and humans apart'" -- in other words, a challenge-response system designed to determine whether a site visitor is human or a bot. Here's how I implemented a CAPTCHA system for blog comments using free and open source (FOSS) command-line utilities and PHP, the Web server scripting language.

I've implemented this system under two Linux distributions -- Ubuntu Dapper Drake and CentOS 4 -- both running the current versions of the Apache Web server and PHP. The CLI utilities are from GNU Enscript and ImageMagick.

Ubuntu required that I download, compile, and install Enscript. I made no changes to the default configuration settings, and the install was without incident. ImageMagick is in the Dapper Drake repositories, so the command sudo apt-get install imagemagick got that installed quickly.

On the CentOS box, neither utility was installed by default, nor available via yum update, so I had to install both applications manually. Again, I made no changes to the default configuration settings, and the installation went without incident.

The blog comments script

My plan for the PHP script for the blog comments page was as follows:

  • Design the overall system such that session state could be ignored. I wanted to avoid the complexity of keeping track on the server of which image had been sent to which blog commenter.
  • Generate a random six-character string
  • Create a PostScript file using the string
  • Create a .png image file from the PostScript file
  • Name the image file using a limited pool of filenames
  • Encrypt the original string
  • Return the image file and encrypted string to the user

This is the relevant portion of the PHP code I used to accomplish those goals:

I create an array of characters that comprise the pool from which to randomly select six. For image clarity, I avoid the numerals one and zero, and the upper and lower case letters "L" and "O":

$arr_chars = array ('a','b','c','d','e','f','g','h','j','k','m','n','p','r','s','t','u',
(all on one line)

The function array_rand() takes two arguments: the name of an array, and the number of randomly chosen keys you wish returned. The result is stored in another array:

$arr_rand_keys = array_rand ($arr_chars, 6);

I concatenate the six characters into a string:

$captcha_cleartext = $arr_chars[$arr_rand_keys[0[[ . $arr_chars[$arr_rand_keys[1[[ . $arr_chars[$arr_rand_keys[2[[ . $arr_chars[$arr_rand_keys[3[[ . $arr_chars[$arr_rand_keys[4[[ . $arr_chars[$arr_rand_keys[5[[;
(all on one line)

I encrypt the string for use as a hidden input field. The crypt() function takes two arguments: the cleartext string, and a "salt" used to "seed" the encryption process. If a salt is not provided, the system will provide a random one, which will foil our efforts. Substitute something reasonably complex for 'salt_value'. You will need this value again to compare the user's input in the PHP script that handles this form's submission. It is essential that the 'salt_value' be the same in both PHP scripts.

$captcha_encrypted = crypt ($captcha_cleartext, 'salt_value');

In order to avoid an automated attack that repeatedly executes this script and fills the hard drive with CAPTCHA images, I limit the pool of possible image filenames to a reasonable number, then re-use them. I start with a text file named captcha_filenum that contains the single numeral zero in it, and read the number from the file and store it in another array:

$arr_filenum = file ('/var/www/html/captcha_filenum');

Then I concatenate a CAPTCHA image filename using the number:

$captcha_filename = 'captcha_' . $arr_filenum[0] . '.png';

I limit the pool of filenames to 100. I increment the filenum value unless it is greater than 98:

if ($arr_filenum[0] > 98) {
$new_value = 0;
} else {
$new_value = $arr_filenum[0] + 1;

Next, I write the new filenum value back to the file captcha_filenum. The Apache Web server runs as the system account apache, which must have write permissions for the directory.

$fh = fopen ('/var/www/html/captcha_filenum', 'w');
fwrite ($fh, $new_value);
fclose ($fh);

I want to put the CAPTCHA images in a directory I can exclude from the backup process:

$path = "/var/www/html/captchas/";

Now I concatenate the path and filename:

$full_path = $path . $captcha_filename;

I use the command-line utilities enscript and convert (which is supplied by ImageMagick) to first turn the randomly generated six-character string into a PostScript file, and then into an image file. The Linux pipe command can run the output of one command into the input of the next. I use the font value Courier-BoldOblique20, but you can use any font, as long as it exists in your file /usr/local/share/enscript/afm/ Since Apache doesn't know where the command-line utilities are located, I provide full paths to them:

$command = "echo '$captcha_cleartext' | /usr/local/bin/enscript -o - -B -f 'Courier-BoldOblique20' | /usr/local/bin/convert -trim +repage - $full_path";
(all on one line)

Now execute the command:

exec ("$command");

After I display the blog comment form elements (text boxes and textarea boxes) I display the CAPTCHA image. I also include, as a hidden field, the encrypted string representing the original randomly chosen six-character string. This encrypted string will be returned to the server with the user's CAPTCHA submission. The user can do no harm by having the encrypted string, and by using this technique I don't have to keep track of the session state.

print "Prove you're not a bot. Enter this: <img src=\"/captchas/$captcha_filename\" /> here: <input type=\"text\" name=\"captcha_test\" size=\"10\" />;"
(all on one line)

print "<input type=\"hidden\" name=\"captcha_encrypted\" value=\"$captcha_encrypted\" />";
(all on one line)

The CAPTCHA image file and text entry box will be displayed like this:

And the hidden field entry will look like this:

<input type="hidden" name="captcha_encrypted" value="tojt1Xx62dqSA" />

The form handling script

Now that we've displayed a CAPTCHA image and asked the user for input we have to handle the submitted data in another PHP script. The second script will:

  • Encrypt the user's entry using the same "salt" used to encrypt the original randomly generated string
  • Compare that encrypted string with the encrypted string returned in the hidden field
  • If the two don't match, reject the comment submission

This is the relevant portion of the PHP code I use to accomplish those goals:

I trim the user's submission to remove leading and trailing whitespace:

$comment_test = trim ($_POST['captcha_test']);

Then I gather the encrypted string from the hidden field.

$returned_encrypt = $_POST['captcha_encrypted'];

There is no decrypt() function -- crypt() is a one-way process. I use crypt() and the 'salt_value' to encrypt the user's submission. I can then compare the result with the encrypted CAPTCHA string returned from the hidden field in the comments page. If the user accurately entered what was displayed on the CAPTCHA image, the two should be equivalent.

$test_encrypt = crypt ($comment_test, 'salt_value');

I don't do anything else unless the captcha has been entered correctly:

if ($returned_encrypt == $test_encrypt) {
// Add the comment to the database.
} else {
// Display a rejection notice.

That's all there is to it. The CLI utilities enscript and convert create image files on the fly, and the PHP crypt() function allows us to safely test blog comment submissions for human origin.

Update -- Someone just visited my site and defeated my CAPTCHA by using a script to resend input cleartext and encrypted values. The exploit involved repeatedly submitting comments using the same encrypted and cleartext versions of the CAPTCHA.

To solve the problem, I created a new database table to store the CAPTCHA as it is issued, mark it as "used" when it is returned with a comment, and accept no more comments utilizing that CAPTCHA. Also, having been issued, the CAPTCHA is no longer available for issuance for an arbitrary time period.

Whew. That'll take you down a peg.


  • Open Source
Click Here!