February 18, 2004

Tux's got game!

Author: Carlos Justiniano

We recently used Linux to orchestrate a network of more than 2,000 machines from more than 50 countries, in real time, to become the first distributed global network to play a chess game against a single human opponent, setting a world record in the process.

The ChessBrain project is a non-profit distributed computing experiment that harnesses the processing power of remote machines. While ChessBrain is functionally similar to other distributed computation projects such as SETI@home, Folding@Home, and GIMPS, ChessBrain is unique in that it requires results in real time!

On Friday January 30, ChessBrain challenged chess Grandmaster Peter Nielsen, one of the world's top players, to a game of chess. The event was organized and hosted by the Danish Unix User Group (DKUUG) and took place live in Denmark, at Copenhagen's Symbion Science Park.

The ChessBrain team and DKUUG members review the games highlights with chess Grandmaster Peter Nielsen (shown on right)

The game was played under official tournament conditions, and according to the guidelines established by the Guinness World Record office in London. ChessBrain held its own as the game lasted 34 moves before resulting in a draw. A record 2,070 computers participated in the event, including several compute clusters, the largest of which was The BioCluster, a cluster of 241 Dell OptiPlex 2.4GHz Pentium 4 machines which are normally used for bioinformatics research. The Guinness organization awarded ChessBrain the record for the largest number of machines used to play a game against a single human opponent.

ChessBrain's successful world record attempt marks the first time in history that a distributed network has played a game against a single human opponent.

How ChessBrain plays chess

ChessBrain plays chess a bit differently from a human. Imagine playing a game against an opponent. Each time your opponent makes a move you quickly analyze the resulting position and determine your most likely replies. Next, you call twenty of your closest peers and ask each to analyze a particular move response. One by one, each person returns your call and describes their resulting analysis. You then consider all of the replies and decide which move to actually play.

Developing ChessBrain

I began the ChessBrain project in the summer of 2001 to explore distributed computing concepts such as P2P, Web services, and grid computing. British Chess program developer Colin Frayn joined the project in 2002 and developed ChessBrain's playing abilities. By December 17, 2002, ChessBrain was playing its first games of chess using a small distributed network.

Carlos Justiniano and Colin Frayn

The ChessBrain Network is a loosely coupled system involving a central server called the SuperNode and thousands of geographically dispersed computers known as the PeerNodes. The SuperNode is a custom application server written specifically for use under Linux, and the PeerNode client is a cross-platform application written for use under Linux, Windows, and Apple Mac OS X machines.

The SuperNode server is a concurrent-connection, multithreaded application server written in C++ and compiled using GCC. PeerNodes communicate with the SuperNode server using Simple Object Access Protocol (SOAP) over HTTP. For security reasons, the actual SOAP messages are encrypted using the Advanced Encryption Standard (AES-Rijndael). Additionally, SOAP packages are compressed using ZLib to help conserve bandwidth requirements.

During a game, the SuperNode makes use of the Linux proc file systems to track real-time system information, which it periodically stores in a MySQL database.

Real-time troubleshooting

During the Copenhagen event we experienced an unexpectedly high level of interest as thousands of machines competed for the SuperNode server's attention, resulting in a situation that resembled a distributed denial of service attack. In order to address the issue it became necessary to make real-time modifications. While ChessBrain team member Gavin Roy modified the proc files system entries (/proc/sys/net/ipv4/tcp_max_syn_backlog and /proc/sys/net/ipv4/tcp_synack_retries) in California, I modified the actual server code from Denmark to change the connection behavior of remote PeerNode clients. We successfully modified the behavior of the entire network, in real time, without rebooting a single machine!

We're pleased to report that Linux has proven ideal and up to the task of setting a new world record. There's no doubt in our minds that Tux has got game!

Click Here!