July 10, 2006

PostgreSQL Anniversary Summit a success

Author: David Graham

This weekend marked the 10th anniversary of PostgreSQL's posting as a public, open source project. To celebrate, the PostgreSQL project held a two-day conference at Ryerson University in downtown Toronto, Ontario, Canada.

The conference started with a keynote address by Bruce Momjian, one of the longest-serving and best known developers of the project, discussing why the conference is taking place, a bit of the history of PostgreSQL, and the future. Momjian started off his talk by announcing to laughs that the PostgreSQL patch queue is empty.

Momjian called his role at PostgreSQL a tremendous honor, and says he does not know what the next ten years would bring for the project. He did predict that tools like PostgreSQL would become more popular.

Bruce Momjian - click to enlarge

Great days, Momjian philosophized, rarely announce themselves. Weddings and graduations come with dates and invitations, but most other significant events just happen. Along the same lines, he noted that open source developers evolve into their developer roles. Many start with submitting a patch during a few hours free time. The contributions snowball and they eventually find themselves with full time employment as a result of their contributions.

PostgreSQL history

PostgreSQL started in the 1980s at the University of California, Berkeley, though most of the people from the era went on to get "regular jobs", says Momjian. In April of 1996, Marc Fournier sent an email to the postgres95 mailing list noting a number of major flaws with the software. Momjian described the state of development as being in maintenance mode. Fournier suggested in his email that, given time and room, it could become a useful project.

The discussion evolved and Fournier offered to host a development server for the project, allowing it to escape from Berkeley and become a modern open source project. Fournier noted in the discussion at the time that Postgres would need to move forward with the help of a few contributors with a lot of time. He commented that a lot of contributors with a little bit of time would not be equivalent.

Fournier's offer to host a CVSUP server came on July 8th, 1996, which the date of the conference commemorates. Pretty soon, work began toward an actual release, allowing the project to graduate out of maintenance mode.

Momjian went on to show the evolution of the PostgreSQL's Web site since 1997, from a comical logo showing an elephant smashing a brick wall to the current professional image of the organization.

Momjian showed a map of the world with markers everywhere he had been representing PostgreSQL, covering much of North America, Europe, and Asia, and commenting that he would soon be adding India and Pakistan to his list of countries. He concluded his keynote with what he termed a show and tell, showing CDs distributed at several points over the history of the project, as well as a Japanese PostgreSQL manual.

Following the keynote, Andy Astor, CEO of EnterpriseDB, got up to make a brief announcement, saying the company has grown to around 100 employees and is based entirely on PostgreSQL. "Thank you PostgreSQL," he says, "for giving me a job to go to." He announced that EnterpriseDB would be giving $25,000 to PostgreSQL as part of ongoing funding earmarked strictly for feature development.

The next talk was by Ayush Parashar of Greenplum on the topic of database performance improvements in the PostgreSQL-based Bizgres database. Parashar discussed various algorithmic improvements to Bizgres' sort and copy functionality. Using a bitmap index instead of a B-tree, he demonstrated, showed vast improvements in large database performance at low cardinality.

Parashar was asked when the improvements would be ported into the PostgreSQL tree. Another Greenplum employee answered, saying it would be after the code was further tested hardened.

PostgreSQL developer and conference organizer Josh Berkus noted that PostgreSQL 8.2 is going into feature freeze in just three weeks, and the Bizgres patches should be submitted as soon as possible to allow them to be integrated into the rest of the tree properly.

Lightning talks

The third session was a bit difficult for me to keep up with, but from what I understood of it, seemed quite fascinating. In the course of one-hour block, 10 speakers were given exactly five minutes each in what was termed a lightening talk. The first two were by employees of Voice over Internet Protocol (VoIP) specialist Skype.

Skype, says Hannu Krosing, runs on PostgreSQL internally. In order to scale to the massive size the company is working toward, it is working on a scalable database system which they're calling PL/Proxy. According to Krosing, the project is soon to be open sourced. PL/Proxy works on the basic principle of splitting databases up by function, and then providing a simple way for these separate databases to be integrated.

The second part of the Skype lightning talk was about Skytools, by Skype's Akso Oja. These tools are queueing tools designed for hard drive failover and generic queueing.

The third lightening talk was by Hiroshi Saito, a member of the Japanese PostgreSQL Usergroup (JPUG) discussing an SNMP daemon, pgsnmpd, for PostgreSQL to allow operational situation surveillance for PostgreSQL databases.

The next in the series was about DBD::Pg, described by Greg Sabino Mullane as the integration of the best database and the best language -- PostgreSQL and Perl. The DBD::Pg module, Mullane says, makes do() loops very fast, using libpq, the PostgreSQL client library. He cited some other improvements, such as UTF-8 (Unicode) support.

He says future releases of DBD::Pg would be developed on Subversion or svk, in an effort to move away from CVS. He hopes, he added, that PostgreSQL moves to Subversion. He says he would also like to add Windows, Perl6, parrot, and DBI v2 support for the module.

The fifth of the ten sessionlets was by someone calling himself only "M", discussing PGX, PostgreSQL client support for Mac OS X. He explained that it is not intended to be a PostgreSQL admin tool, but rather a simple front end tool for PostgreSQL databases.

PGX allows non-blocking execution, which means the user can continue working with the program while it's off querying the database. Asked if it is possible to cancel a query, he was very succinct in saying that that capability had not yet been written. PGX is written in objective C, and allows the simultaneously querying of multiple databases with the same queries.

The sixth session was by Jean-Paul Argudo on the topic of Slony-I as a generic solution for aggregating data through multiple installations. Instead of replicating a master database a network of slaves, he explained. Users, he says, do not want to connect to each database separately. Slony-I uses a slave database to replicate a network of master databases.

The next in the series of brief discussions was on the topic of Red Had clustering by Devrin Gündüz of Command Prompt, in Turkey. Gündüz discussed PostgreSQL with Red Hat Cluster Suite. He described it as a redundant system for data, host, server, and power. According to Gündüz, there is no time for downtime. All it needs to work, he says, is hardware powerful enough to run Red Hat Enterprise Linux, and between two and eight servers with identical configurations.

The eighth sessionlet was by Neil Conway about TelegraphCQ, a Berkeley research project. The idea behind TelegraphCQ, he says, is to allow streamed queries. The queries, he says, are long lived, but the data is short-lived. Conway described an example of the use for such a system is for security monitoring sensory networks with action being taken based on the streamed query.

More information on this project can be found at telegraph.cs.berkeley.edu.

The ninth session was by Alvaro Herrera on the topic of Autovacuum maintenance windows. He explained that the system Is being based on cron, the task scheduler in most Unix-based systems. It allows maintenance windows to be specified so that database cleanup can be scheduled to be carried out by the database in off-peak hours for that database.

The final lightening session was presented by David Fetter about running a Relational Database Management System as an object within the database. He briefly discussed performance differences between object-based and relational databases.

The lightening sessions concluded the morning session of the first day. In the afternoon, Gavin Sherry and Neil Conway presented a pair of one and a half hour long back-to-back sessions called an Introduction to hacking PostgreSQL. After checking to see that nearly everyone in the room had at least a basic knowledge of the C programming language, they got into it.

You need to know C to hack PostgreSQL, Conway says. Fortunately, it's an easy language to learn. PostgreSQL, he added, is a mature codebase and good code to help learn C from. Conway says Unix system programming knowledge is useful, but not necessary, depending on what part of PostgreSQL you want to hack on.

He gave a few technical pointers on debugging, such as ensuring that if there's a new bug in your code that you can't explain that you make clean and recompile from scratch to ensure everything is current.

He recommended ensuring that you have a good text editor, suggesting Emacs, to make your life easier. He also recommended a number of tools to reduce the amount of development time wasted debugging, such as ccache, distcc, and Valgrind.

Neil Conway - click to enlarge

Conway and Sherry traded off for the rest of the presentation, providing an entertaining, easy to follow tutorial session. Among the things they warned about is avoiding idiosyncrasies in coding style that annoy people and waste time for no discernible positive gain. Read the code around what you are patching or contributing and make your changes conform to the adjacent style.

When writing your patches, especially ones that add features, send the idea to the project first to make sure it is one that would be welcome. They cited an example of someone who wrote a 25 thousand line patch that had to be rejected.

When determining what patches to write, they suggest asking yourself a number of questions, for example:

  • Is this patch or feature useful?
  • Is it a patch for the PostgreSQL back-end, or is it for the foundry or contrib/ directory?
  • Is it something that is already defined by the SQL standard?
  • Is it something anyone has suggested before? Check the mailing list archives and todo list.

Most ideas, they cautioned, are, in fact, bad. Also, they warned, make sure your submitted code is well commented, and tested properly.

The PostgreSQL conference will be having a code sprint following the main part of the conference. They recommended checking the code sprint wiki for ideas to cut your teeth on.

PostgreSQL doesn't like centralization

The last session of the first day was on the topic of fund-raising, hosted by Berkus. The discussion started with an introduction to the Japanese PostgreSQL Users Group (JPUG) by Hiroki Kataoka.

In Japan, Kataoka says, PostgreSQL is more popular than rival database MySQL, owing largely to earlier Japanese language support in PostgreSQL. JPUG started with 32 members and eight directors on July 23rd, 1999, Kataoka says. It now boasts 2,982 members, 26 directors, and a Japanese-language mailing list with around 7,000 subscribers.

He showed a map of Japan broken down into its 48 provinces, showing which had JPUG regional chapters or which otherwise had a PostgreSQL presence. Nearly half the provinces of Japan have a JPUG regional chapter. JPUG offers a number of activities and incentives, including PostgreSQL seminars, summer camps, a regular newsletter, PostgreSQL stickers -- and PostgreSQL water bottles for distribution at JPUG events. The JPUG, which is a registered non-profit in Japan, has numerous corporate sponsors.

Jean-Paul Argudo introduced the French PostgreSQL organization: postgresqlfr.org of which he is treasurer. It was started in 2004, Argudo says. Its Web site is powered by Drupal and the group has a presence on irc.freenode.net in #postgresqlfr. It's a registered non-profit under French law 1901. It has 50 members that pay €20 per year each. The Web site has some 2,000 users. The organization invites donations through its Web site but managed a mere €25 of Web donations in its first year.

The Web site, Argudo says, has around 1,400 pages of translated PostgreSQL documentation, information on migration, and translated news and information from the main PostgreSQL website. Work is in progress to produce books, he added.

Berkus introduced the rest of the world's organizations, noting that PostgreSQL currently deals with four non-profit organizations for fund-raising: JPUG in Japan, PostgreSQLfr in France, FFIS in Germany, and US-based Software in the Public Interest (SPI) for most of the rest of the world. PostgreSQL joined SPI after finding that creating their own 501(c)3 US non-profit organization was a very difficult and expensive proposition.

Josh Berkus - click to enlarge

PostgreSQL, he says, does not like centralization.

Following these introductions, Berkus led a discussion on the nitty-gritty of PostgreSQL's internal political structure, especially as it related to dealing with the non-profit organizations and organizing PostgreSQL's money.

Day two of the conference

The second day of the conference was far more intensely technical than the first, with a variety of talks by developers about their PostgreSQL sub-projects such as pgpool, pgcluster, Tsearch2, and other topics.

During the morning session, Peter St. Onge of the Department of Economics at the University of Toronto gave a talk on the role of databases in scientific research. St. Onge says that PostgreSQL's flexibility, extensibility, and speed, make it ideal for the research environment.

He discussed the unique needs of databases in research environments. Each lab, he says, is different. Most currently operate on a Linux, Apache, MySQL and PHP (LAMP) platform, but research labs are switching to what he termed a Linux, Apache, PostgreSQL, and PHP (LAPP) platform.

St. Onge says his goal is to put data handling logic into the database back-end. From samples, to analysis, to mass spectroscopy, to analysis, to storage, to archiving, every step that a person has access to creates room for error. Every step that can be automated is an improvement.

A lot of data in different labs is stored in different units, he noted. Allowing basic functions within the database such as conversion of degrees Fahrenheit to degrees Celsius, for example, would allow better integration of data from multiple research facilities.

The 10th anniversary PostgreSQL conference went well, overall. Session time-limits were strictly enforced and technical problems were at a minimum, making for a smoothly run conference. All the talks were recorded, and most of them were recorded on video. Anyone interested in hearing any of the talks should be able to do so on the conference Web site in the next few weeks.

The material was largely highly technical, often way over my head, but the people were down to earth, and judging by the reactions of people around me, most understood and appreciated what was being says. The conference operated on a budget of around $30,000, including travel stipends for many of the presenters.

Out of 90 people registered, Berkus says that only five failed to show. "Some due to specific issues (like health problems). Four "extra" people who had not registered due to some significant communications issues did show, so we were still slightly over capacity." A further 11 people were wait-listed and unable to attend as a result.

There is already discussion of a reprise of the conference. Says Berkus, "We're currently discussing the possibility of a conference next year, maybe even a 300-attendee user conference. We're somewhat undecided about whether to do it next year or the year after though, and where it should be located. A survey will go up on the conference Web site sometime soon -- if you're interested in the next PostgreSQL conference, please watch for it (use the RSS feed) and fill it out."

As for the long term consequences of the conference, Berkus says he's "hoping that it will lead to better coordination and communication in our really far-flung community. Having developers from so many different parts of the community face-to-face, even once, should help us overcome some barriers of language, distance and time zones.

"We should see some accelerated code development soon with people sharing ideas. For example, I think the various replication/clustering teams learned a lot from each other. I also think that, having met people in person, there will be subtle changes in the way we regard each other back on the mailing lists. A bunch of people didn't look or sound like I expected. I'm not sure what those attitude changes will be, but I'll find out soon enough."

This year's conference was organized by four PostgreSQL volunteers: Berkus, Andrew Sullivan, Peter Eisentraut, and Gavin Sherry. Next time, says Berkus, they're hiring professional help to organize any conferences. "Now," he says, "I'm finally going to get some sleep."


  • News
Click Here!