October 24, 2005

User companies pool on PostgreSQL enhancement

Author: Jay Lyman

When faced with a pay-now or wait-for-functionality choice, several companies that use PostgreSQL recently pooled their resources and paid for development of faster, deeper indexing capabilities in the open source database.

The effort resulted in row-level locking in the GiST indexing system that bolsters PostgreSQL's enterprise-readiness, and reflects a paradigm for greater user support of open source software projects. But there were still some kinks in the process that should be worked out, according to developers and corporate sponsors of the new feature, which is slated to be included in the upcoming version 8.1 release of PostgreSQL and currently available in the PostgreSQL CVS repository.

Feature for sale, limited time only

Although there was fairly widespread recognition of the need for the improved concurrency and recovery support in GiST-based PostgreSQL extensions -- tsearch2, ltree, intarray, and PostGIS -- that need was not immediate, and certainly not cause for companies using PostgreSQL to open their pocketbooks, said Refractions President Paul Ramsey, whose consulting and product development organization is the principal developer of the PostGIS open source spatial database.

"I became aware of the need for functionality pretty early on in the life of PostGIS, and corresponded with (PostgreSQL GiST developers Oleg Bartunov and Teodor Sigaev] about it almost two years ago," Ramsey said. "The table-lock limitation of GiST was a pretty obvious future performance bottleneck. However, as a company, we did not really need the functionality at the time, so we were not personally very invested in pushing it. Casual requests to the PostGIS users list didn't elicit any response, so I left it there. It was a few months ago that they said, 'we have time to do it, but we must start now if we want 8.1,' so I hit up the list again. I think that (a) the deadline for 8.1 and (b) the changing use cases for geospatial (more write load than before, for real-time and Web apps) helped get people out of their seats and up to the CFO this time around. And the community is two years bigger, two years more companies depending on the product, so there are just more potential wallets available."

All told, the effort -- which drew support from Refractions, Cadcorp, WebBased, GlobeXplorer, Intevation GmbH, and Mobile Meridian -- garnered just under $10,000. In addition to its $2,000 contribution, Ramsey said Refractions also spent another $1,000 in staff time doing quality assurance, testing, and accounting.

"One of the facts which galvanized support was the existence of a deadline," Ramsey said. "We needed funding by time X, or the enhancement would not get done in time to be included in the 8.1 release. Knowing that they could pay now and get it for sure, or not pay and have to wait another full release cycle, helped tip some organizations to the 'pay now' column."

Development and data sets enhanced

Webbased Senior Developer Mark Cave-Ayland said he became aware of the GiST need for support through discussion on the pgsql-hackers and postgis-users mailing lists.

"As part of our main mapping system, we tend to load large amounts of PostGIS-indexed data into our database," he said. "The lack of concurrency was obvious due to the fact that we couldn't alter any of the existing data until the complete load had finished."

Cave-Ayland, whose company designs and produces Internet, database, and imaging technologies, said the feature improvement means more productive development time and faster performance for users.

"For some of our shared tables, a single update could take tens of minutes and therefore would prevent any other developers working on the system until the update finished," he explained. "For our users, the benefit is that we can load multiple data sets within the same time frame, therefore they obtain access to the data quicker."

Ramsey said prior to the upgrade, GiST indexes supported table-level locks only, which meant that in high volume read/write conditions, performance could become "very choppy."

"The PostGIS user community will probably get the biggest benefit, because they store more complex type data than any other users," he said. "For example, one of our geospatial databases is only 15,000 records, but takes up 40GB of space. Speeding up concurrent updates on that database is going to radically reduce data processing time for our projects. Online services that support a large number of simultaneous users will also get a big performance boost."

Up to the enterprise, ahead of demand

Ramsey also indicated the feature was a boost for more business use of PostgreSQL, which will also be giving users more capabilities.

"The upgrade moves the object-relational side of PostgreSQL up to enterprise level with the rest of the data types," he said. "While read performance and query planning for complex types was excellent in previous versions, applications with a heavy write load suffered. In the geospatial domain, where our company works, more and more database applications have to process high-volume simultaneous reads and writes. Real-time location tracking, for example, involves high-volume read/write for geospatial data. We want to get out ahead of demand and make sure this functionality is available when our clients ask for it in their applications."

Feature co-developer Oleg Bartunov, who works at the Sternberg Astronomical Institute at Moscow University, said GiST recovery support was also implemented within the framework of the project, which was ideal from his point of view.

"As an open source developer, I needed some organization so I could submit my proposal to get support," Bartunov said. "It was a lucky opportunity that our interests met OpenGIS requirements and as a result, we completed our long-awaited project."

Bartunov added the cooperative feature sponsorship had also created optimism for more PostgreSQL improvements, some of which are on the developer's to-do list.

"I'm preparing an extended proposal for HP Russia, with whom I have talked about PostgreSQL support here in Russia," he said. "Our goal is to develop [enhanced] built-in full text search. Now, after we made GiST enterprise ready, it looks really possible."

Model improvements?

While the feature worked out well for the companies that participated, PostgreSQL, and its users, it also showed the need for an infrastructure dedicated to such collaborative funding and administration of development, PostgreSQL's Josh Berkus indicated.

"This is the kind of development we have a call for all the time on PostgreSQL," he said. "The open source world has long needed a formal structure that supports this sort of financial collaboration, but enforces accountability, testing, and payment. The folks at Fundable.org took a stab at it, but their application doesn't go far enough to be really useful for business-funded development. So we still need a 'SourceForge for money.'"

Nevertheless, the process worked for the parties involved, according to Cave-Ayland from Webbased.

"For me personally, it seems that the idea of raising money as a group to obtain a desired feature is a very good one," he said. "Being a small company, it's not something that we could have done by ourselves, and so with everyone contributing, everyone gets to benefit."

The biggest challenge, Cave-Ayland said, was for Ramsey to organize the project and act as a middleman between the customers and the developers.

"I think it was also hard to find companies willing to support the project," he added. "I had a lot of initial resistance because it was felt that if we have a free project, why should we be paying for it. To me, it makes more financial sense this way. At the moment, we pay for each installation of Microsoft SQL Server for each new ASP server we commission. We've been using PostgreSQL at no cost for over three years, and we've just paid the equivalent of one SQL Server license for a feature that we can use across all our database servers, and that will really make a difference to our end product."

For his part, Ramsey said development of the PostgreSQL improvement was "a good conceptual model, but not a practical model."

"It only worked in this case because the density of users who could see the future value of the upgrade was sufficient within a particular community (the PostGIS community) and there was a trusted actor (Refractions) who could galvanize that community. Because we have credibility in the PostGIS community, we can stand up and say 'this is an important, useful thing, and we'll back it' and bring people along. The credibility of the sponsoring organization, combined with enough other potential supporters, is critical, and not necessarily replicated for every feature or open source community. Finally, we were willing to spend the dollars and time to make sure the effort had credibility from the start, and not every organization is going to do that."

Click Here!