May 11, 2010

Maddog Editorial: Reusable Code and What It Means to Your Company

In the FOSS community we have a saying that “good programmers write good code and great programmers 'steal' great code.”

Of course we do not advocate “stealing” anything, but the real meaning of the phrase can also be thought of as “not re-implementing the wheel."

When I first started programming in 1969, there were no computer stores with boxes of software for almost any solution. People who were mathematicians, physicists, engineers, scientists, and even commercial people of all types had to write their own solutions. In order to leverage their efforts they often donated their code to libraries run by organizations like Digital Equipment Corporation's User Society (DECUS), SHARE (IBM's user group) and other libraries of software. Later on, as early networking occurred, bulletin boards of software appeared, and (still later) sites at universities would hold huge numbers of programs for free distribution.

While binary programs allowed the users to run the applications that they found, the real value was in having the source, since you could often leverage a small piece of the program you found to do other work. You did not have to think about how to change the projected solution into some series of steps for the computer to solve; and you also did not have to spend the time coding that series of steps into some computer language, then do incremental testing on that algorithm. You already had a coded algorithm that had been tested before.

Time moved on and the concept of  “re-usable code” developed into subroutine libraries, and even into the concept of the Unix utilities such as sort(1)i, grep(1), spell(1) and other command-line programs. These libraries and commands were written, tested, re-written and improved over time until they were very, very efficient at what they did. Even if a library routine did not do exactly what you needed, if you had the source code you implement that routine in a private library with the little twist you needed and eliminate a lot of potential errors in coding that might occur if you wrote the routine from the beginning.

Another issue with closed source has to do with loss of technology. I worked for Digital Equipment Corporation (DEC), which was purchased by Compaq, and then Compaq was “merged” with Hewlett Packard.  Each company should have inherited all of the technology of the purchased or merged companies.  Practically speaking, however, I estimate that huge numbers of software projects were lost through the transitions from one company to another.

The pattern of creating closed source proprietary code in a large company often means that one part of the company re-invents the same “wheel” that another part of the company is developing. DEC often had two or more projects that were trying to solve the same problem, whether it was trying to develop a database, a new graphics library or a new operating system. One project would be “selected” and the other projects would be abandoned. Just because a project was not selected, however, did not mean the project did not have good ideas that could have been adapted by another group if they had access to the source code of the doomed project.  As it was, the software of the doomed project often “disappeared” and was never re-used.

One promise of object-oriented languages was to be the creation of large libraries of re-usable code. For the most part, this never appeared, perhaps because of the issues of copyright, patents and licensing that encumbers the sharing of code today, or perhaps because of the lack of source code for the libraries that did not allow the receiving body to check for Trojans, trapdoors and other issues. 

In the FOSS world there are over 230,000 categorized applications out on that could provide complete or partial solutions.

There is also the concept of teaching by example. In the early 1970s at the University of New South Wales in Australia there was a professor by the name of John Lions. John believed that the best way of teaching students how to write good code was to expose them to the code of good programmers. To facilitate this, John annotated the source code for Edition 6 of Unix, the kernel written mostly by Ken Thompson and Dennis Ritchie, commenting most of the code, then writing commentary on the algorithms selected.
Unfortunately the source code licensing for AT&T Unix changed before John could publish his book, and for twenty years the only copies available were ones photocopied from preliminary copies that John had given his students for review. And photocopies of the photocopies. And photocopies of those photocopies. In fact, it is rumored that John Lions' book is the most photocopied technical book of all time, second only to the Bible in photocopies overall.

Fortunately for us all, John's book exists today. In the late 1990s some friends of John decided to go to the copyright holder and get permission for John to publish the twenty year old source code. You can find it listed as “Lions' Commentary on Unix” with John Lions as the author and Peter Salus as the editor, and it is felt by many people that the book is still worth reading.

Unfortunately today there are too many cases of good code being locked behind the doors of closed source, proprietary companies and too few examples of good code being available for review.

Finally, I believe that computer science (and other) research has been hampered by the predominance of closed source distribution. In the early days of Unix, developers would gather at conferences with their magnetic tapes and take home the sources of research being done by the speakers so the attendees could collaborate further on the research. Over the years, as binary-only versions of Unix came out, the speaker would talk about the techniques they used and even publish papers on those techniques, but the re-implementation of their efforts due to lack of source code availability hampered the collaboration by others.

Likewise researchers who signed source code agreements with closed-source proprietary companies, believing that the large number of end users of that software would guarantee widespread usage of their research work, have found that the intermixing of their research with the sources of the proprietary company meant that their research was distributed only if the proprietary company allowed it to be distributed.  Often good research and advanced development never saw the light of day.

Companies have, in the past, assumed that all the code written by their programmers had widespread value to their customers and protected it all under the mantle of closed source, releasing the code as “open source” only when there was a strong business case to make it “open.” Often the proof of open source value was very arduous, and therefore not often pursued. Perhaps it is time to reverse the practice and make every piece of code open source, unless there is a demonstrated business reason to keep it closed. Then more programmers can stop re-inventing the wheel.

Click Here!