July 24, 2008

GovTrack opens up information on US legislature

Author: Tina Gasperson

Since 2004, GovTrack.us has housed information about the United States Congress, including 10 years of bills, voting records, and contact information for individual members of Congress. Visitors can also find out who represents them and search the database for committee assignments, legislative statistics, and the Congressional Record, which is the official record of daily proceedings in Congress. All the code that makes GovTrack run is open source, and all the information stored there is freely available to everyone.

GovTrack.us provides flexible and powerful search options to visitors, presenting results in an easy-to-use, attractive format that often makes use of the Google Maps API and drill-down screens that graphically assist users in finding more information about their local senators and representatives, as well as bills and other legislative action all the way back to the 103rd Congress in 1993-94. You can add GovTrack's search engine to your Firefox search toolbar to make it easier to search for the information you need, anytime.

Josh Tauberer is the creator and curator of GovTrack.us. He's a fifth-year graduate student at the University of Pennsylvania, studying for his Ph.D. in Linguistics. The impetus for the site sprang from work he did for a freshman class on copyright law. Tauberer made use of Thomas.gov, the official Web site of the Library of Congress. "They have a lot of information -- it was remarkable to me that you could find out about the status of legislation and every action taken on every bill in not quite real time. It was powerful and interesting."

Tauberer wasn't happy with the way the information was presented. "You could search it, and sort of browse it. But RSS feeds were just starting at that time, and I thought that would be an interesting way of following what was going on in Congress." In 2001, Tauberer started picking apart the information at the site to see if he could access the database and manipulate the information, repackaging it in a more user-friendly way. He says it became less about politics and more about "doing something with that mass of information."

The data housed at Thomas.gov and other government sites was not published in an open format, and the Library of Congress didn't readily respond to Tauberer's requests for bug fixes. "They didn't give me a hard time," he says, "but they weren't particularly helpful either. The problems were all kind of small but they added up. The data you need isn't always there, and you don't know what it's going to look like. The biggest problem was taking the names of members of Congress and normalizing that and turning them into database identifiers. The Thomas Web site would use one set of names, but in the Senate records, the name may be slightly different." Tauberer ended up reverse-engineering his site by manually recreating the Thomas.gov databases. He coded the site searches and the front end, and released everything as copyleft and open source.

Tauberer says the site exists in four parts, for those interested in making use of the code and the format. The front end is coded in ASP.Net with a custom page generation system. The legislative database is a large collection of XML files that contain information he pulled from government Web sites. Then there are the APIs, and finally the back end, which is a collection of Perl scripts that access government sites to pull in the latest information. All of it is available for download at GovTrack.us, along with some tutorial information about how to get your own version of the site up and running.

Tauberer first became interested in open source in 2003, in the middle of working on the site. "I got interested in the Mono project and XSLT. I contributed a few patches to Mono at the time, which were later replaced by other things -- for good reasons," he says. "I was a small contributor, but I began to understand open source. Now it feels like I've always known about it. I hadn't really thought about GovTrack.us as an open source project in the beginning. I was fairly isolated, though there were other people doing related things."

Tauberer was motivated to make GovTrack.us into a full-fledged open source project when he saw commercial companies making sites like his, but keeping their coding, and the information it produced, proprietary. He licensed his code under the terms of the GNU AGPL, and copylefted all original data. "It kind of bothered me that companies were holding onto this information very tightly. It seemed like the type of information that ought to be public. No one should be able to restrict your ability to use this information. I kind of wanted to 'stick it to the man.' So when the site started (in 2004) I decided that all the data I collected would be made available to other sites."

The next steps

Tauberer says that recently he's been thinking about the idea of community. "Over the last month, I've thought about how to build a real developer thing -- a wider thing." He created OGOSH -- the Open Government, Open Source Hacking group -- and invited people to join him on Facebook. "I invited the people I knew would be interested, and they invited some people I didn't know. It's really just kind of a protogroup right now. I was trying to get the numbers up in the group before trying to get them to rally around anything."

Tauberer just launched a community Q&A section on GovTrack.us. "People can submit questions and other people can submit answers," he says. Tauberer decided that increasing the interactivity of the site by adding a Q&A section would add to its usefulness. At the top of every bill page, there is a box to submit questions about the bill. He says there has been a lot of activity in the new section. "It feels like there is a community growing there too."

Tauberer hopes that government data will become easier for the general public to access, so that everyone can be more informed about what's going on in Congress. "I have dreams for what I think should happen," he says. "I'd like to see Web sites with common data standards that make it easier to mesh data sets, and a formal nonprofit organization that can channel funding into the right places." He compares his vision for the OGOSH community to Mozilla. "That's a mix between paid people and people working in their spare time." He'd also like to grow more community on the non-programming side of things -- "a community of policy wonks that wants to contribute in the same way as a volunteer open source programmer does. There's a community of people interested in policy that wants to talk about it. It's a matter of finding them and making the barrier to entry very low."


  • News
  • Internet & WWW
  • Government
Click Here!