July 27, 2007

Fedora stats offer insight into Linux usage

Author: Joe 'Zonker' Brockmeier

The Fedora Project offered a peek under its kimono recently with details about Fedora 7 adoption and other statistics. Fedora 7 has snagged more than 300,000 users since its release at the end of May. While that sounds pretty good, Fedora Core 6 managed to attract more than 400,000 in roughly the same amount of time after its release. We asked Max Spevack, the Fedora project leader, whether the numbers are telling the full story.

Say what you will about proprietary software companies, but the successful ones do have plenty of data about users -- what kind of uses the software is being put to, how many users they have, what types of machines they're running the software on, whether they wear boxers, briefs, lacy underthings, or nothing at all. The proprietary folks get reams of data from their users.

Linux distributors, on the other hand, are mostly in the dark. The downside to freely redistributable software is that there's no way to know for sure how many people are using it, nor can projects form a clear picture of the average user.

We still have no firm idea just how many people and organizations are using Linux in general, but the Fedora Project is starting to get some idea of how many systems are running Fedora and what type of hardware it's being used on.

What about Fedora 7?

Spevack says that he's not concerned that Fedora 7 has only seen about 80% of the users that showed up for Fedora Core 6 in the first few weeks. "I'm pleased with the numbers that we are seeing so far for Fedora 7. Everyone, including myself, was blown away with the Fedora Core 6 numbers, and with a distribution that is as fast-moving as Fedora is, I'm fine for now with just being in the same ballpark. We've been averaging about 75,000 new IP addresses each week since the release, so no one is walking around disappointed."

But why isn't Fedora 7 hitting the same numbers, if not more, than Fedora Core 6? Spevack suggests that it may be related to the new support lifespan for Fedora Core 6. "Because of the changes that we made a couple of months ago that extend the lifespan of a Fedora release, there is no pressure to update every single time. Any given release of Fedora (call it Fedora X) is maintained until one month after the release of Fedora X+2.

"As such, if a user wants to be on an every-other-release update schedule, the user can do so without ever being in a place where their machine will not receive updates. So I think that the slower adoption rate may be related to that -- people don't *need* to update right now, and so perhaps not everyone has yet."

Tell me about your hardware

Not content to know how many systems are running Fedora, the project has also been working on Smolt, a hardware profiler geared toward gathering hardware data from users automatically. Spevack says that the tool is opt-in only, and that "we are building a community around Smolt that extends beyond Fedora, and into other Linux distributions."

To get other distros in on the act, the Fedora developers have issued an invitation to other distros to use Smolt.

According to Spevack, "We are trying to build Smolt so that it can be a general upstream project usable for all Linux distributions, and not just Fedora. If (a) anyone can set up Smolt server and (b) Smolt client packages are available for as many distributions as possible, then eventually we will have a very robust database of hardware devices that are being used on Linux systems. This is useful for convincing hardware manufactures to open their specifications. I envision functionality appearing that allows a user to not only submit their hardware, but also submit comments about how well it works or doesn't work under a particular distribution, which also helps with bug tracking, development, etc.

"But the short answer is: Smolt is built as free software, and we'd like anyone in the world to be able to use it, and help contribute to it."

However, the data gathered so far may not give the most accurate picture of Linux users, in that it's skewed toward desktop systems, or at least systems that were set up with a GUI.

Spevack says that "about 98% of the machines registered show up as being in runlevel 5," which is the default runlevel for desktop users. "If you install a Fedora 7 machine with any GUI, an application called firstboot runs which helps you set the time zone, create the first user, set up firewall and SELinux, etc. One of the screens in firstboot is the 'do you want to send your hardware profile' screen. Therefore, it is very easy for a user with a GUI to submit their profile."

Conversely, users who set up Fedora as a server OS are unlikely to see the invitation to submit data, and Spevack says that means "the current Smolt statistics are probably a very good picture of what desktop installs of Fedora 7 look like."

So what do desktop installs of Fedora 7 look like? According to the data, nearly 60% of the more than 82,000 registered systems are recognized as desktop systems, and 21% of systems are laptops. There are a lot of unknown systems, with about 20% unrecognized, and only 0.7% of the systems are registered as servers.

Fully 8% of users are running Fedora under VMware, according to the vendor page, making it the top-ranking recognized vendor. "System manufacturer" actually clocks in first, but VMware leads the pack of recognized vendors, followed by Hewlett-Packard (5.7%) and IBM (2.7%), while Apple lags with only 0.3% of recognized systems.

ATI, just barely, leads the pack for video cards, with 41.1% of systems tallied, versus 40.1% of systems that include Nvidia cards. The math is a bit suspect, however, as the next leading vendor, Intel, has 23.8% of the registered systems, VMware shows up with 8.1%, Via Technologies pulls in 2.3%, and several other vendors show up as well for a total of more than 100 percent of systems. Some users may be running systems with multiple video cards, however, which could explain why we see more video cards than systems.

How stats are gathered

How reliable are Fedora's numbers? The project's methods aren't going to gather a perfect picture, but they're reliable enough to get a reasonable ballpark figure. Spevack says that the project looks at "connections that are being made to the 'updates' repository for Fedora," which indicates which version of Fedora is being used and the IP address the system is connecting from.

Spevack acknowledges that this isn't perfect. "Dynamic IP addresses are likely to be counted twice, and things like proxies are likely to make large corporate installations of Fedora look like a single IP address ... but the numbers we get out of this examination give us a decent ballpark."

He also notes that the caveats are indicated on the stats page so that readers are well aware of the methods used and their flaws -- something other distros may not do. "I see a lot of talk about different distributions having install bases in the 'millions of users,' but not a lot of talk (outside of Fedora) about where those numbers come from. We try to present our data, warts and all, without forcing people to ask for the extra detail."

Other interesting tidbits

The project also gathers stats on BitTorrent downloads, which indicate that about 75% of the downloads are the Fedora DVD, with about 25% of the downloads split between the GNOME and KDE live CDs. GNOME fans outnumber KDE fans, at least as far as being consumers of the live CDs, with 60% of users opting for GNOME and 40% opting for KDE.

Right now, the project doesn't have a totally clear picture of whether its "average" user is deploying Fedora as a desktop system or server OS. Spevack says it's "safe to assume" that live CD users are probably using Fedora as a desktop, while DVD installs could be used for servers or desktop installs. "So that's a long way of saying we have a pretty healthy mix."

The Mugshot project also gathers data on popular applications. The data is actually related to how often applications are used, and not just by what apps are installed. According to Mugshot's application statistics page, Firefox leads the pack, followed closely by GNOME Terminal, Nautilus, Evolution, Evince, gedit, Thunderbird, and Totem.

With any luck, other distros will follow Fedora's lead in gathering data from users to help further improve Linux and determine where and how Linux is, and isn't, being used.


