Why Linux Is a Model Citizen of Quality Code


Coverity’s 2011 Open Source Integrity Report gives kudos to Linux for its high-quality code.

Coverity, a development testing product provider, has been kicking the code tires on open source projects since the U.S. Department of Homeland Security initiated the project in 2006. Now Coverity owns and manages the Scan program, which is the largest public-private sector research project focused on open source code integrity in the world. The latest integrity report doesn’t call Linux “perfect,” but “model citizen of good quality” is pretty darn close.

With the Coverity Scan 2010 Open Source Integrity Report, Coverity started releasing details on specific open source projects. The Android kernel 2.6.32 (Froyo) got called out for having 359 software defects, with 25% of them considered high risk with potential security and stability problems.

The 2011 report offers details on three big open source projects: Linux 2.6, PHP 5.3, and PostgreSQL 9.1. “These three projects represent a mix of software types, including system software, programming software, and application software, and a mix of codebase sizes ranging from just over 500,000 lines of code to almost 7 million lines of code,” the report says.

In the 2011 report, Coverity compared the code quality of open source projects to a representative sample of proprietary codebases. The proprietary code samples came from anonymous Coverity users from a variety of industries. The study found that open source code holds its own when compared to the quality of proprietary codebases of similar size.

Coverity wasn’t able to compare 2011 open source code quality to data from previous years, however. More defects were flagged this year, but Coverity attributes that to improvements in Coverity Scan and its technology rather than an increase in buggy code. Further, open source developers fixed 6,133 defects in 2011, which is an increase over 2010.

The latest report covers 37 million lines of code from 45 of the most active projects in Coverity Scan. According to Coverity, 32 of the projects were active prior to 2011, and many of them have been in the Coverity Scan from its beginning. Only high- and medium-impact defects were used in Coverity’s calculations.

“While the average codebase size for the open source projects in our analysis is 832,000 lines of code, the range of codebases varied from under 100,000 lines of code to two projects with codebases totaling nearly 7 million lines of code,” the report explains. All of the projects had active developer support and participation, but it ranged anywhere from 1-100 users. Ten of the projects had less than 100,000 lines of code, 20 had 100,000 to 500,000, and the rest had anywhere from 500,000 to more than 7 million.

The good news? The quality of open source software is above average for the software industry. But you probably already knew that.

The analysis measures defect density, which is the number of defects per thousand lines of code. Code with an average defect density of equal to or less than 1.0 is considered high-quality, and the average for open source in the 2011 Scan was .45. Control flow issues, which includes code that never executes or that executes under the wrong conditions, were the most common defects this year.

How Linux 2.6 Measures Up

With 6,849,378 lines of Linux 2.6 code scanned, 4,261 outstanding defects were detected and 1,283 were fixed in 2011. The defect density of Linux 2.6 is .62, compared to .20 for PHP 5.3 and .21 for PostgreSQL 9.1. Keep in mind that the codebase for PHP 5.3 — 537,871 lines of code — is a fraction of that of Linux 2.6, and PostgreSQL 9.1 has 1,105,634 lines of code.

Still, PHP and PostgreSQL developers shouldn’t feel too smug. The Coverity report puts it in perspective, saying:

“Linux version 2.6 grew from 5.3 million lines of code to 6.8 million lines of code between December 2010 and December 2011, and the soon-to-be-released version 3.3 is reportedly 15 million lines of code. With this codebase size, it likely has few people who know enough of the code to feel comfortable triaging and fixing defects throughout the codebase. In addition, as more people are required to be involved in resolving defects it takes longer to fix all of the defects—and particularly in a large codebase. Even harmless defects may require a much larger effort to triage than in a smaller project.”

The 2011 report concludes that the lines between commercial and open source projects are blurring, and commercial projects shouldn’t be afraid to integrate open source code. When it comes to labeling open source software as something separate from — and unequal to — proprietary software, Coverity says those days are becoming a thing of the past.