Census Bureau: Open Source makes sense to deliver stats on the Web

64
By Grant Gross

If you’re checking out demographic information at the U.S. Census Bureau’s Web site, there’s a chance the information is courtesy of several Open Source tools. Two senior technology architects with the Census Bureau’s Internet division said the low cost of Open Source software, plus strong support from the developer and user communities make Open Source the right choice for several Web-based projects at Census.gov.

Lisa Wolfisch Nyman and Rachael LaPorte Taylor described five Open Source Web projects at the Census Bureau Tuesday during one of the Open Source Software for e-Government series of discussions sponsored by the General Services Administration, the National Science Foundation and the Cyberspace Policy Institute at The George Washington University. The talk, held at the NSF just outside of Washington, D.C., drew about 30 government employees and private sector developers, two thirds of whom raised their hands when asked if they were Open Source advocates.

On several Web projects, the Census Bureau has used the LAMP suite of Open Source tools: Linux, the Apache Web server, the MySQL database, and the Perl, PHP and Python programming languages. In addition to saving tax dollars, Open Source software makes sense at an agency that uses a combination of Linux, Unix and SGI machines — and even a couple of Windows boxes, said Nyman.

“[Open Source] fits our heterogeneous environment,” Nyman said, listing off some benefits. “In government, we don’t have to worry about procurement — purchase orders, contracts, anything like that. We have access to support, which some people say is a myth. We actually know a lot of the authors of the Open Source software; we know them personally and professionally. It really helps when you can call up Joe and say, ‘hey, there’s a problem with your modules, can you help with it?’ Or even, ‘boy, it would really help us if you added this.’ “

Among the recent Web projects at the Census Bureau that use the LAMP Open Source tools (not counting other Open Source projects at the agency):

  • The Census 2000 Internet Questionnaire. Nyman said she was a bit disappointed that only 67,000 U.S. residents decided to fill out their Census forms online, but the 2000 online form will serve as a prototype for future online efforts.

  • The rates.census.gov project, part of the “How America Knows what America Needs” campaign encouraging better response rates to the 2000 census. The rates site tracked the response of local census efforts, and the 67% final response rate was the first time in the history of the census that the rate had increased, said Taylor.

  • The state and county QuickFacts site, where visitors can click on maps for easy-to-find statistics on their home states or counties. For example, the county where I grew up in North Dakota has a population of 5,102, with 26% of its population aged 65 or older and a median monthly income of $27,798. Over 19% of children there are living below the poverty line, and there are 773 college graduates there, according to the latest numbers. Nyman said the QuickFacts site, designed to be easy enough for sixth-graders to use, gives visitors a nutshell view of their counties or states, without having to deal with complicated search procedures elsewhere at census.gov. The project has no budget and is created by volunteer staff at the Census Bureau, making the Open Source tools particularly important, he added.

  • The MapStats project, another map-based site that compares state, federal judicial district or congressional district statistics to national stats. This project has been recognized by the GSA as a good use of Open Source tools in government, said Taylor. This project is part of FedStats.gov, a gateway to statistics from more than 100 U.S. agencies that also uses several Open Source tools.

  • The Census Bureau also uses Open Source software to collect and disseminate economic indicators required of member countries of the International Monetary Fund. Taylor said the U.S. model is generating interest from several other countries.

    Asked if their department has encountered resistance from superiors at the Census Bureau, Nyman said that hasn’t been the case, at least at their agency. “People who are at the highest levels, I think they understand Open Source and what we’re trying to do,” she said.

    Taylor did admit that saving money at the federal level by using Open Source can come back to haunt those agencies. “Once you save money, that money is taken away from you and used on projects that aren’t yours,” she said. “[The attitude is] ‘Look, they’ve saved all this money, and we’re going to take it away and buy (proprietary) products.’ “

    After Taylor and Nyman spoke, there was a presentation from Martin Mickos, CEO of MySQL AB, on the company’s business model. MySQL databases are featured prominently in the Census Bureau’s Open Source projects.

    Mickos said his company, unlike some Open Source companies, has been in the black since 1996. MySQL AB offers support and training courses for MySQL, and the company does sell a commercial version of MySQL. The product is dual licensed under the GNU General Public License and a commercial license, so that users have a choice of using it for free or paying for it if they want to avoid the code-sharing requirements of the GPL.

    Either way, it’s the same product, Mickos said, and the company gets the benefit under the GPL of having thousands of developers contributing to the code and doing grass-roots marketing. Those benefits balance out any monetary losses from not having a completely proprietary product. “We’re not sorry that people are using it for free,” he said. “We’re not sorry that someone could spend a couple of million dollars but gave us $2,000 instead.”

    Sasha Pachev, senior developer with the company, also outlined how MySQL is running large chunks of the finance.yahoo.com site. One Linux machine using MySQL is handling the bulk of the load for 331 tables averaging 50 megabytes a piece, he said. The peak load is 1,200 queries per second.

    Asked if that load is MySQL’s limit, Pachev said that’s as big a site as MySQL runs, but that doesn’t mean the database is at its limit. “It’s like a Ferrari — it can go 200 miles per hour, but nobody dares drive it that fast,” he said. “Yahoo Finance using MySQL is like driving a Ferrari 120 miles per hour.”