January 29, 2007

Fedora's metrics have ripple effect

Author: Lisa Hoover

Fedora announced this month that by using a tracking tool to monitor unique IP addresses, it was able to determine that Fedora Core 6 now has more than one million users. What does all this metric gathering mean for future Fedora releases? Moreover, what does it mean for the Linux community at large? The answer on both counts: plenty.

Fedora decided to track metrics with the release of Fedora Core 6 (FC6) because the lack of data from previous releases made it difficult to be sure what users value in Fedora Core packages. Focus groups like November's Fedora Summit help the team plan ahead for future releases but don't tell whether they've hit the mark once the distribution is released. According to Fedora Project Leader Max Spevack, the best way to serve the Fedora community is to understand what it is they're looking for and then deliver. Metrics help the team determine where they have succeeded and where they could do better.

The method

As the release of Fedora Core 6 drew near, team members knew they wanted to be able to track statistical information to better understand how people use Fedora. The team turned to the user community for ideas on what types of data to collect and what methods to use. Suggestions on the Fedora Metrics wiki range from user surveys and registration to embedding a file within the package that would send user data to a central server. The overriding concern from the team and user community alike is privacy, so invasive and sly data collection methods are not being considered.

"The different methods discussed ranged all the way to very intrusive registration with UUID," says newly-appointed Fedora Infrastructure Leader Mike McGrath. "We like to avoid being evil so we're going to do whatever we can to make sure people can not participate and to make sure stuff is submitted anonymously. The only way to track it back to a machine or user is for the user to actually give us the identifier and say, 'My sound doesn't work, here's its profile: 3333-44-2-22322223424.'"

Cacti, an open source data collection and graphing tool, was already monitoring other pieces of Fedora's infrastructure, so using it with FC6's release was a natural extension. Setting up and implementing Cacti was a group effort among several members of the infrastructure team who worked diligently to get it ready for FC6's release date, and most of the information about what they did "is in the heads of the infrastructure guys," Spevack says -- but anyone who would like to discuss how to implement something similar for their project is welcome to contact the team.

Cacti tracks the number of unique IP addresses that connect to yum with a new installation of FC6 in search of updates. Determining the number of unique IP address is the main focus of this metric, but McGrath says several other pieces of information, as yet to be determined, will be collected following the release of FC7.

According to Spevack, it's not enough to simply count how many times the distribution has been downloaded; it's also important to gather data that will help developers determine what to focus on for future releases. Spevack says knowing what packages are getting the most bugs filed, which are being installed most often, and so on gives the team a clearer understanding of what the user community likes and dislikes about Fedora.

While only minimal information was collected during the release of FC6, the team hopes to cull much more data with the release of Fedora Core 7 later this year. Spevack says that, as with the metrics collected from FC6, "We're going to put the results out regardless of what they show. If the numbers are good, that's nice. If not, well, then we have a benchmark and it tells us where to improve.

"Statistics is just one way of understanding how people are using our software. Any insight we can get into how folks are using Fedora helps us to make better decisions."

The metrics gleaned from Fedora's data collection amount to more than just a chance for developers to pat themselves on the back, however. They also provide the opportunity to show the growing number of Linux users within the computing community which, in turn, may goose hardware vendors into offering more Linux-friendly goods and services.

"This provides objective data that helps prove Linux is growing and it helps build a case for Linux in general" says Spevack. "Also, we always say we wish hardware vendors had more [Linux-capable] drivers. Well, if you can go to them and say, 'Hey, there's millions of people using this,' then maybe they will listen. In the real world, you need data to prove your case. Well, here it is."

Although neither Red Hat or Fedora have approached any vendors with the results of Fedora's metrics, Spevack says Red Hat remains committed to urging vendors to continue to be Linux-friendly, and "if Fedora's numbers can add another arrow to the quiver, then excellent."

Better software through better metrics

A final decision on what metrics will be collected and what methods will be used is still weeks away, but McGrath says end-user participation will not be mandatory. "Users who are highly concerned about security can simply not participate, though I'd like to note that while in the minority, they are a very vocal group," says McGrath. He goes on to say most people are in favor of gathering metrics as long as there is a purpose and a goal behind it, not just random, meaningless data collection.

Like Spevack, McGrath says that thorough data collection will ultimately lead to better Fedora packages. In addition to gathering metrics on user hardware, he would "also love to get a proper survey engine so we can flat out ask people what they're using our software for. We'd also be interested in getting a package list though that's down the road. This would be useful to see what packages are popular and which ones are just duds."

The team would like to collect more elaborate data for the release of Fedora Core 7, however, so discussions are currently underway to expand on the data collection method currently in place. Though there is still work to be done on the metrics-gathering tools that will be used during the release of Fedora Core 7, team members say they will be ready in plenty of time for its April release. Spevack says he is looking forward to sharing the results and getting community feedback. McGrath agrees and says he wishes more software vendors would collect and share similar information. "To my knowledge no one else is actually showing users the math and methods to estimate install base, popular architectures, etc. It's a shame."

