April 24, 2012

Looking at the PostGIS 2.0 Release

The PostGIS database project made its long-awaited 2.0.0 release in April, marking the culmination of more than two years of development. PostGIS is an industrial-strength geographical database that serves as the storage system for a wide range of geo-data processing systems, from map servers to analysis tools.

As the name suggests, PostGIS is based on the open source PostgreSQL database. What it adds is substantial, however: data types for geometric shapes, points, and lines (as well as polygons and compound objects grouping and combining them), raster data support (for indexing and correctly aligning imagery such as satellite photos to the coordinate system), operators for calculating intersections and measurements (as well as complex geometry interactions), and a special storage index optimized for spatial searches and queries. Many of these are not operations that fit neatly into the table-and-column mindset of a standard relational database, so the fact that PostGIS can support them while also supporting operations on traditional databases is what makes it so powerful.

The new release is available for download from the project's home page. PostGIS implements the Open Geospatial Consortium's Simple Features specification for SQL; a widely-adhered-to industry standard.

New in 2.0

The raster data support mentioned above is actually one of 2.0's highlight features. Traditionally map data is stored as vectors – points, lines, and shapes – but there are scores of situations in which overlaying vector data on top of raster images is important to analysis. The list of supported formats includes generic image types like TIFF and PNG, but the majority are specific to the geodata field, to account for positioning and precise alignment.

PostGIS 1.x was limited to 2D mapping, but 2.0 adds 3D and 4D support as well. 3D support lets databases model terrain not just as elevation lines on a plane, but in full three-space, so that application can actually calculate distances over and around features, find volumes, and construct the intersections of 3D regions. The 4D support is neither science fiction nor hyperspatial; rather it allows PostGIS administrators to index measurement data in three dimensions, allowing the database to model an independent variable (think temperature, for example).

PostGIS 2.0 also allows a database to encode mathematical topology, meaning you can define regions with shared edges (or, in the 3D case, shared faces), or save weighted or directed paths. You might use these topological constructs to solve a traveling-salesman-type problem, or encode the direction of traffic on a network of streets. There are obviously other ways to approach topological problems in general; the power here is that PostGIS allows you to tackle them on a geographical model.

Raster, 3D/4D, and topology support are all enhancements to the type of data that PostGIS can work with, but there are other improvements, too. One of the biggest is the nearest neighbor search problem. Just like it sounds, a nearest neighbor search returns the n nearest objects to a given starting point. The naive approach is to calculate distances for everything in the database and sort them, but this is terribly slow and unscalable. PostGIS implements a far faster solution by returning sorted information from the index. For calculating point-to-point distances, this is remarkably fast and simple, although for calculating distances from lines and polygons there is more math to consider – thus the database offers you multiple options.

Improved Features

PostGIS's existing feature set received plenty of attention in this development cycle, including the vector storage format, the indexing system, the parsers, and more. In essence, the 2.0 release means that all of the original, 2001 code has now been rewritten – not all for this release specifically, but it is impressive that every nook and cranny has been updated.

There is a long list of new vector functions in this release, including functions that calculate 3D distances, split objects, create parallel "offset" lines, and automatically correct invalid or corrupt data. The importing and exporting tools received a makeover, too, allowing you to load and export multiple files simultaneously.

There is also a new geocoder – a utility that takes a snail mail address or location (in "human readable" form) and returns the geospatial coordinates for it. The new geocoder is written to work with the US Census Bureau's public domain TIGER (Topologically Integrated Geographic Encoding and Referencing system) database, but it can be extended to support other sources as well.

Finally, although the PostGIS projects works hard to provide an all-in-one geodata tool, most real-world users have other data analysis needs. For this new release, the project has made and effort to keep PostGIS compatible with the upstream extension system for PostgreSQL 9.1. Other Postgres extensions should be compatible – but of course you will need to run tests of your own before deploying them.

The Geodata Leader

It is difficult to overstate the importance of PostGIS to the open source GIS ecosystem. The project's wiki lists more than 35 other applications that use (or can use) PostGIS as their data back-end. The list includes heavy hitters, such as the analysis package GRASS, the Geoserver and Mapserver web display systems, and numerous desktop tools (even some that are proprietary).

PostGIS's main competition comes from the expensive proprietary database market, where it enjoys an excellent reputation. In March, it was put to a head-to-head benchmark competition with Oracle Spatial, and it came out on top. And that was even before 2.0.0 hit the streets.