OpenStreetMap (OSM) has completed the bulk import of comprehensive street and highway data for the United States, months ahead of the project's original estimates. The massive data set originated with the US Census Bureau's public domain map database, and importing it required a dedicated upload process running around the clock since August 2007. The imported data will still require human editing and error-correction, but the completed task is a major milestone for the OSM project.
As we reported in October, OSM's Dave Hansen retrieved the data from the Census Bureau's Topologically Integrated Geographic Encoding and Referencing (TIGER) system and converted it into an OSM-friendly format offline.
Once happy with the results, the project began importing the data into the main OSM system using three dedicated daemons running concurrently. Requiring all of the imported data to go through the OSM server's API just like manually collected GPS trace data meant it was a time-consuming process, but it was safer than attempting to bypass the API and alter the database directly.
Quadtiles to the rescue
At the beginning of the process, the predicted completion date hovered between late May and early June 2008. Luckily, admin Tom Hughes found a way to re-index the database using quadtiles, resulting in greatly shortened database lookup times.
Quadtiles recursively split each quadrant of the map into four subquadrants, allowing for better space efficiency by only subdividing those quadrants that require more detail -- a quadrant containing only ocean and therefore no roads, for example, would not require subdivision, whereas a metropolitan city center would.
Indexing the database with quadtile keys resulted in two benefits. First, the quadtile keys are shorter -- 32 bits as opposed to 16 bytes for the old latitude/longitude indices -- so less memory is required. And second, because of quadtiles' hierarchical nature, geographically close nodes are adjacent in the database index, which improves cache performance.
The speed gains resulting from the new database index affected all API requests, not just imports, and earned Hughes a lolcat of awesomeness award from the other OSM participants.
Now the real work begins, and you can help
The TIGER data set covers 6% of the Earth's surface. Its successful import does not mean that the work is finished. Users who have collected their own GPS logs in areas covered by the TIGER maps and uploaded the resulting data report sporadic problems with TIGER's information. Problems include misalignment of roads, missing features (including the regular absence of on-ramps and access roads, and representation of divided highways as a single road), and occasional confusion on features such as cul-de-sacs. Since the TIGER map data was produced from aerial photography, and was originally intended to assist Census Bureau officials in the field, such problems are bound to occur and are unlikely to have undergone official correction.
But even with its faults, the TIGER data is orders of magnitude better than no data at all, and serves as an excellent baseline for community improvement. OSM provides interested users with tutorials and helpful hints to aid them in correcting the problems they encounter. Users collecting their own GPS routes can update mistakes in the imported TIGER data using the Java Open Street Map Editor (JOSM) developed by the project.
OSM has performed one other bulk data import. In July 2007, AND Automotive Navigation Datadonated a comprehensive road map of the Netherlands and highway system maps of India and China. The Netherlands import was completed last fall, but the data for the India and China import has yet to be released to the project.
Although other prospects for large-scale map donations have been discussed, none are on the horizon. The addition of the AND and TIGER information gives the OSM project a helpful boost, but the bulk of the future work remains in the hands of individual users, each contributing their input toward the whole.