April 2, 2002

How to document your code

- By Jenn Vesperman -

Many programmers don't know what to write in code documentation, and the lack of documentation remains a frequent complaint about Open Source or Free Software programs. Common advice is "write what you'd want to see if you were reading the code," but that's vague and not entirely helpful. Programmers need specifics: clear guidelines, in categories and with reasons they can understand.

A mnemonic is also useful. I'm going to use the "how, when, where and why; what, which and who?" list of question starters.

What does it do?

This is the most important question. What does the program do? If I'm trying to fix a bug report that says "it doesn't foo the bar," I need to know if the bar is supposed to be fooed.

Write at least a sentence for each function, a paragraph or more for each module or group of functions, and at least a paragraph for the program itself. In these, explain what the code is supposed to do.

Add a comment after every few lines of code stating what those lines are intended to do, put these in every time you move to a new step of the algorithm.

How does it work

This is only slightly less important than "what does it do?" Admittedly, it's often possible to answer this one by reading the code -- but only if the code isn't buggy.

If you have a loop counting from 1 to 20, I can't tell if you're accidentally or deliberately ignoring element zero. The only way I can know for sure is if you commented the code.

Any code where you can expect an error should have a comment stating your intention. I suggest commenting places where an off-by-one error is possible, anywhere that uses pointers; and anywhere that you have complicated logic, regular expressions, or elegant code that another person might not understand.

Why was it written this way?

Why a stack, not a list? Why in C not in Python? Why repeat-until not while? Good coding, like good gaming, is a series of interesting choices.

Record your choices, and record the reasons for them. Later maintainers (or you!) can then make informed decisions about updating and modifying the code. Circumstances change, and people forget why decisions were made the way they were. The only certain way to remember your reasons is to document them.

Also record bugfixes. Later on, someone will (not might!) be tempted to pull out an apparently useless bit of code that you put in to prevent baz problems in quux machines. And it took you a week to get it right.

"Why this way" comments are essential anywhere you choose to break programming guidelines, lest someone try to "fix" the code to fit the guideline. Besides, you might teach someone something.

Which part does what?

Back to our fooed bar. We have determined that fooing is, in fact, a feature of our program, and that the bug reporter is correct and fooing is failing. How do we fix it? Where do we find the foo module?

The most effective way from the maintainer's point of view is to have some sort of document with the source code which describes the overall structure of the code and explains which code modules support which user-view features.

This doesn't have to be a long document, and can be a simple list like:

foo.c, interface.c
baz.c, quux.h, interface.c

Use the same terminology in the features list as the users see, and include at least the main module for that feature in the code list.

Where do I find each part?

You've been thorough documenting your code, and carefully written an index of features and modules. Your maintainer knows he's after foo.c. So where is it?

Include your installation guide, your "make install," or some similar guide in your technical document. Include the location of your makefile -- once the maintainer has fixed your code, she will need to build it!

If your feature index lists functions rather than files, you will need to list which file each function is in. You can list the functions as "foo() in foo.c" in the feature index.

Who wrote it?

Always add this. You never know when you'll get a call from a headhunter offering you a lucrative contract because of a piece of code they saw. Or for more mundane reasons -- someone might need to ask you about the code.

When was it written?

Add this, too. It might be useful to the headhunter -- and it also gives a guideline for what sort of machine you wrote it for. And it may give a clue as to why you put in that apparently useless delay loop.

Final words

This is only one approach to documentation -- most professional technical writers use much more structured and detailed approaches.

If you clearly answer each of these questions, the people who maintain your code will be much, much happier. And technical writers will be able to go through your code for their material, rather than making you explain it all.

Example program

 * greet.cc
 * A 'Hello World' program written to demonstrate code commenting.
 * Written by Jenn Vesperman
 * March 2002

 * 'Hello World' was chosen because almost everyone is familiar with it
 * in some form or other, so the reader can almost ignore the code
 * and concentrate on the commenting.
 * C++ is chosen partly because it has two styles of comment. Where a language
 * offers two comment styles, one can be used for extensive blocks of comment
 * like this, for function and program descriptions, and for important 
 * 'pay attention to this' comments.
 * This lets the other style be used for small, embedded comments that are
 * used simply to clarify code.

 * This program, having only one function, is too small for supplementary
 * documentation. It should just compile and run.
 * Compile with: g++ -ogreet greet.cc

#include <iostream>
#include <cstring>
#include <string>
 * This function finds out who to greet, greets them, and exits if
 * the input is the string "quit".
int main()
	std::string name;

	start:		// label to use with goto

	// Finding out who to greet
	std::cout << "Enter a name (one word only), or 'quit' to exit" << std::endl;
	std::cout << "> ";	// prompt

	 * Warning: While buffer overflow isn't a problem here, there's
	 * not actually any protection against, say, getting a gigabyte of
	 * input with no whitespace in sight. That alone could DoS
	 * a machine.
	std::cin >> name; 

	 * strcmp returns 0 (false) if the two strings are the same. 
	 * Therefore we use a comparison with 0, rather than simply 
	 * using 'if !strcmp'.
	 * We're using strcmp() for a reason, here: As an example.
	 * As such, we're making a point of mentioning the choice, for those readers
	 * who are wondering 'why?'
// is it quit? If not, greet then return to start
	if (!(std::strcmp(name.c_str(),"quit")==0)) {
		std::cout << "Hello " << name << std::endl;

		 * We use the 'goto' construct rather than a while or do-while
		 * because we need to demonstrate commenting when we break the 
		 * rules. There's no other reason for it. 
		goto start;

	// if it is quit, exit with the return value 'true'.
	} else {
		return 1;

	// Should never get here. Return false if we do.
	return 0;

Jenn Vesperman is an Open Source coder and coordinator of Linuxchix.


  • Migration
Click Here!