Linux.com

Feature: Migration

How to document your code

By on April 02, 2002 (9:00:00 AM)

Share    Print    Comments   

- By Jenn Vesperman -
Many programmers don't know what to write in code documentation, and the lack of documentation remains a frequent complaint about Open Source or Free Software programs. Common advice is "write what you'd want to see if you were reading the code," but that's vague and not entirely helpful. Programmers need specifics: clear guidelines, in categories and with reasons they can understand.

A mnemonic is also useful. I'm going to use the "how, when, where and why; what, which and who?" list of question starters.

What does it do?

This is the most important question. What does the program do? If I'm trying to fix a bug report that says "it doesn't foo the bar," I need to know if the bar is supposed to be fooed.

Write at least a sentence for each function, a paragraph or more for each module or group of functions, and at least a paragraph for the program itself. In these, explain what the code is supposed to do.

Add a comment after every few lines of code stating what those lines are intended to do, put these in every time you move to a new step of the algorithm.

How does it work

This is only slightly less important than "what does it do?" Admittedly, it's often possible to answer this one by reading the code -- but only if the code isn't buggy.

If you have a loop counting from 1 to 20, I can't tell if you're accidentally or deliberately ignoring element zero. The only way I can know for sure is if you commented the code.

Any code where you can expect an error should have a comment stating your intention. I suggest commenting places where an off-by-one error is possible, anywhere that uses pointers; and anywhere that you have complicated logic, regular expressions, or elegant code that another person might not understand.

Why was it written this way?

Why a stack, not a list? Why in C not in Python? Why repeat-until not while? Good coding, like good gaming, is a series of interesting choices.

Record your choices, and record the reasons for them. Later maintainers (or you!) can then make informed decisions about updating and modifying the code. Circumstances change, and people forget why decisions were made the way they were. The only certain way to remember your reasons is to document them.

Also record bugfixes. Later on, someone will (not might!) be tempted to pull out an apparently useless bit of code that you put in to prevent baz problems in quux machines. And it took you a week to get it right.

"Why this way" comments are essential anywhere you choose to break programming guidelines, lest someone try to "fix" the code to fit the guideline. Besides, you might teach someone something.

Which part does what?

Back to our fooed bar. We have determined that fooing is, in fact, a feature of our program, and that the bug reporter is correct and fooing is failing. How do we fix it? Where do we find the foo module?

The most effective way from the maintainer's point of view is to have some sort of document with the source code which describes the overall structure of the code and explains which code modules support which user-view features.

This doesn't have to be a long document, and can be a simple list like:

foo
foo.c, interface.c
baz
baz.c, quux.h, interface.c

Use the same terminology in the features list as the users see, and include at least the main module for that feature in the code list.

Where do I find each part?

You've been thorough documenting your code, and carefully written an index of features and modules. Your maintainer knows he's after foo.c. So where is it?

Include your installation guide, your "make install," or some similar guide in your technical document. Include the location of your makefile -- once the maintainer has fixed your code, she will need to build it!

If your feature index lists functions rather than files, you will need to list which file each function is in. You can list the functions as "foo() in foo.c" in the feature index.

Who wrote it?

Always add this. You never know when you'll get a call from a headhunter offering you a lucrative contract because of a piece of code they saw. Or for more mundane reasons -- someone might need to ask you about the code.

When was it written?

Add this, too. It might be useful to the headhunter -- and it also gives a guideline for what sort of machine you wrote it for. And it may give a clue as to why you put in that apparently useless delay loop.

Final words

This is only one approach to documentation -- most professional technical writers use much more structured and detailed approaches.

If you clearly answer each of these questions, the people who maintain your code will be much, much happier. And technical writers will be able to go through your code for their material, rather than making you explain it all.

Example program


/*
 * greet.cc
 *
 * A 'Hello World' program written to demonstrate code commenting.
 *
 * Written by Jenn Vesperman
 * March 2002
 */

/*
 * 'Hello World' was chosen because almost everyone is familiar with it
 * in some form or other, so the reader can almost ignore the code
 * and concentrate on the commenting.
 *
 * C++ is chosen partly because it has two styles of comment. Where a language
 * offers two comment styles, one can be used for extensive blocks of comment
 * like this, for function and program descriptions, and for important 
 * 'pay attention to this' comments.
 * This lets the other style be used for small, embedded comments that are
 * used simply to clarify code.
 */

/*
 * This program, having only one function, is too small for supplementary
 * documentation. It should just compile and run.
 * Compile with: g++ -ogreet greet.cc
 */

#include <iostream>
#include <cstring>
#include <string>
/*
 * This function finds out who to greet, greets them, and exits if
 * the input is the string "quit".
 */
int main()
{
	std::string name;

	start:		// label to use with goto

	// Finding out who to greet
	std::cout << "Enter a name (one word only), or 'quit' to exit" << std::endl;
	std::cout << "> ";	// prompt

	/*
	 * Warning: While buffer overflow isn't a problem here, there's
	 * not actually any protection against, say, getting a gigabyte of
	 * input with no whitespace in sight. That alone could DoS
	 * a machine.
	 */
	std::cin >> name; 

	/*
	 * strcmp returns 0 (false) if the two strings are the same. 
	 * Therefore we use a comparison with 0, rather than simply 
	 * using 'if !strcmp'.
	 * We're using strcmp() for a reason, here: As an example.
	 * As such, we're making a point of mentioning the choice, for those readers
	 * who are wondering 'why?'
	 */
// is it quit? If not, greet then return to start
	if (!(std::strcmp(name.c_str(),"quit")==0)) {
		std::cout << "Hello " << name << std::endl;

		/* 
		 * We use the 'goto' construct rather than a while or do-while
		 * because we need to demonstrate commenting when we break the 
		 * rules. There's no other reason for it. 
		 */
		goto start;

	// if it is quit, exit with the return value 'true'.
	} else {
		return 1;
	}

	// Should never get here. Return false if we do.
	return 0;
}



Jenn Vesperman is an Open Source coder and coordinator of Linuxchix.

Share    Print    Comments   

Comments

on How to document your code

Note: Comments are owned by the poster. We are not responsible for their content.

OT: Actually provide an architecture and user doc!

Posted by: Anonymous Coward on April 03, 2002 02:11 AM
I know this is off topic afa the article is concerned. For those who care, however, here
are a couple of very important points:

1) As a user, I might not care (that much) about the code documentation. Quite often, the implementation details are beyond my horizon anyway, documented or not. What I really need
is a) user doc, b) FAQ, c) mailing list in order
to get started, not get stuck and resolve remaining issues. As an example, I consider
Linux quite awkwardly documented internally,
yet with sufficient external user documentation
available (not always provided by the kernel hackers themselves).

2) As an interested developer, I am looking for
an architecture. Most of the time, there ain't any. Often the 'architecture' goes no further than a few ramblings and the original napkin everything fits on. Nowadays there's no excuse for not providing the basic goals, the essential features, the underlying third-party pieces etc.
I'd also expect a (set of) good UML diagram(s),
and prefer that over the well-documented code.
The main reason is that the body of code is often
too large to digest in a limited amount of time.
This is hampered by most UML tools being proprietary and expensive. But check out ArgoUML for a free one.

3) I suspect two reasons for badly documented code: a) 'hacker' attitude (if you're dumb, this code ain't for you) and b) someone else could steal it, if they only understand it, for competing or proprietary purposes.
To the coders: Try to be honest about your motivation not documenting things properly.
Then try to understand why a) and b) are quite shortsighted arguments.

Last but not least, I can assure you that your project can only grow so much unless it is well
documented. Even if you consider it a pain in the ass, it may be the most valuable asset of your project.

Many developers get some revenue from writing
articles ro books on they creations.
As another example, check out what the JBoss project does with user documentation. They are actually pay-for items.
Depending on where you stand on this policy, you will either:
A) Say that this attitude is the undoing of the project since people will avoid it, or
B) Consider it a serious business model.
But by feeling strongly about the issue, you will
discover how important the documentation is for you.

Consider documentation promotional material.
Based on your project and its documentation, you might be the first to be asked for professional services around it.

#

Re:OT: Actually provide an architecture and user d

Posted by: Anonymous Coward on April 03, 2002 04:40 PM
"I am looking for an architecture."

The problem with all those fancy UML tools is that you have to maintain those diagrams.

Whenever you change the code, you have to go back to the drawing table and update the diagrams. This doesn't work. It's a hassle.

To me, the documentation should be kept in the code. I have very good experiences with doxygen (www.doxygen.org), which can generate fully cross-referenced html documentation from your source code. It even generates clickable(!) inheritence- and collaboration diagrams.
This is perfect. You can enhance the generated documentation by adding some special comments to the code, but you don't have to.

So, you write your documentation while coding. It's always up-to-date. And you never have to go back to the drawing table to update your diagrams.

This way, you can work any way you like. You can work top-down, bottom-up or anything in between at any moment.

-- Arend --

#

Re:OT: Actually provide an architecture and user d

Posted by: Anonymous Coward on April 03, 2002 05:00 PM
"I can assure you that your project can only grow so much unless it is well documented."

Ballocks.
Your project can only grow so much unless it is well _engineered_.
This doesn't mean is _has_ to be documented at all. It means first of all dependency management. No header files should be included unless absolutely necessary. Whenever you need a certain class definition in a header file, see if you can forward declare it and use a reference, instead of including the headerfile for the entire class.
Secondly, it means simple, clean interfaces with well chosen names. Whenever you see that the name of a class does not entrirely cover it's purpose, you'll have to go back and change it's name.
Minimize commenting-out pieces of code that you don't need anymore. You have CVS for that. Delete dead code, because it's a maintainance nightmare.
The code is _the_ most important entity in software engigeering. So keep it clean and dust-free, just like your office ;)

#

too much

Posted by: Anonymous Coward on April 03, 2002 03:37 AM
nobody should document that way or that much. comments should only describe things that are odd... programmers are fairly good about knowing when they've done something clever, they just think it might be a good exercise to the reader to figure out why it's so clever.

but real crit:
1. you shouldn't label syntax. telling the reader that a label is used with goto is silly.

2. you shouldn't document externals. it's not your job. telling the reader how strcmp works is futile.

i don't know why so many books say otherwise...

#

Re:too much -- I agree

Posted by: Kefaa on April 03, 2002 04:29 AM
The intent of documentation is to describe the concept and approach. If a developer needs to have the language explained, they should back away from the keyboard before someone gets hurt.

This has nothing to do with a hacker mentality or that someone may steal it. Explaining the language is beyond the scope of module documentation. An exception would be an unusual or undocumented language construct or extension where determining what it does is neither obvious or explained within the language itself.

A good example is regular expressions where it is possible to figure out what is being done, but one line of documentation would explain what:
if [[ $var = fo@(?4*67).c ]];then ...
does.


 

#

Re:too much???

Posted by: Anonymous Coward on April 04, 2002 08:52 AM
I disagree. Those comments are helpful to newbie's like me. I just learned about goto's and labels in C++ from that.

Before this, I'd thought they were only in BASIC.

#

Re:too much???

Posted by: Anonymous Coward on April 07, 2002 11:48 AM
Yes, but each and every program written shouldn't contain the information required to allow a newbie to learn the language used: that's incredibly wasteful of time and effort. Comments should be for other programmers. There are numerous sources available for newbies to learn the commands and syntax of a given language.

Comments should detail what you have done _conceptually_ (not syntactically) and why.

#

Re:too much???

Posted by: Anonymous Coward on May 17, 2002 04:03 AM
Do you want every program you could ever possibly glance at to have the 'stuff' to teach you c++? Do you want every book you ever read to teach you the meanings of the words in it, the difference between first and third person, active and passive voice...

#

In addition...

Posted by: Kefaa on April 03, 2002 04:14 AM
I would add that modules should include a modfication log. Knowing who wrote a module can be helpful, knowing who re-wrote it more so. A simple three pane (Date, Name, Description) can handle all but the most extreme changes. Documentation at the change point should also be updated.

All that being said, the difficulty often seen is that developers do not get rewarded for documentation. Without a reward (or penalty), there is little motivation for developers to look for the next person. Instead I see code documented to remind the developer what they did, assuming that everyone will have the same understanding they do of the issues, solutions, and technology. Documentation tends to improve, much like coding as developers mature. Poor developers are often poor at documentation. Good developers can be better, but it takes time and we know it.

Again, this is a case where there is no reward or penalty so it takes individual desire to maintain documentation over working the next issue. In a commercial environment getting it out is often more important business wise than fully documenting the code. Whether that is a good approach is a newsgroup in itself.

#

return values from main

Posted by: Anonymous Coward on April 03, 2002 06:10 AM
Where's the comment for the non-standard return values from main? The normal return value should be 0 (aka EXIT_SUCCESS) not 1.

#

Re:return values from main

Posted by: Anonymous Coward on April 05, 2002 01:00 AM
C99 has two comment styles as well.
The choice of language because of multiple comment styles is iffy at best.

#

Document the data

Posted by: Michael Cain on April 03, 2002 06:18 AM

Quoting Fred Brooks'
Mythical Man Month, Chapter 9:

Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowcharts; they'll be obvious.

Substitute "code" for flow charts and "data structures" for tables, or "methods" and "classes", to bring it up to date. What I most often find missing, and Lord knows I'm guilty of leaving it out myself, is the overall description of how this program is supposed to operate, and that generally means data organization. What are the structures (objects) used? How do they relate to one another? Are there invariant conditions on the collection of structures, and at what points must they be satisfied?


Give me that, and a bit of a roadmap about where code lives, and I've got a fighting chance. Otherwise I'm going to spend a lot of time and effort figuring out the basics, and I'm a heck of a lot more likely to break things if I have to make changes.

#

code is

Posted by: Anonymous Coward on April 03, 2002 08:09 AM
If someone does not know how to read code, that is their fault. Code should not be documented in any other way, but javadoc.
If one does not know language, they should stay away from the code. Code itself is writing. Annotations in books is what documentation should be. If proper naming is used *no documentation* would be needed.

If code is structured in sane way, using tools like doxygen will lead to quick understanding, cosidering coders are competent and are not lazy.

For gods sake programming is not engeneering.
peace,
perlpimp

#

Re:code is

Posted by: Anonymous Coward on April 03, 2002 10:30 PM
You're absolutely right.

Not documenting code is wonderful job security.

Not only can anyone else figure out what I was doing and so has to ask me, I myself, in six months, can't figure it out either.

Which means I have to take time to do so. And, since I get paid while I am re-reading my old code, this is more money in my pocket.

So why bother to documen?. In fact, I'm thinking of deliberating introducing bugs that I would have to fix ......

#

Re:code is

Posted by: Anonymous Coward on April 04, 2002 03:39 AM
See how nasty you are... XP mandates unused documenation to be thrown out. I go further, people who don't know the nuts a bolts of the language of my code should not touch it. People who know the language though, are all welcome to modify it. Documenting code is excuse for poor programming practices, if your code is structured sanely, it will be easy to pick up from there on.
It is completely different issude when a company tries to hire immigrant programmers for burgerflipping wages, or people right out of college *HOT* and *READY* for high paying jobs, who have no interest in computing.

perlpimp.

#

Re:code is

Posted by: Anonymous Coward on April 04, 2002 05:21 AM
If I hired you to write code for me, and you didn't document what you wrote, you'd never work for me again. Of course, I'd let you know up front what's expected of you, but if you leave behind an entry in the obfuscation contest you won't get any references from me.

#

Re:code is

Posted by: Anonymous Coward on April 05, 2002 09:32 AM
Yes, dolts must document their code. Its like a guideway for computer-crippled people who's preception what contorted into thinking that programming is engeneering. Blah. Computers talk, they have their own language. Learn it use it. Stop making excuses for not knowing technology.

#

Re:doc is essential

Posted by: Anonymous Coward on April 04, 2002 10:23 AM
I know nothing (ok, ok, almost nothing) about C, though I DID manage to fix a gkrell module to look at my laptop battery instead of an imaginary one (wrong dir under /proc) that worked for the original developer. So your premise that only coders should check out code is wrong. I might not know how to write the darn thing but, once seen it, perhaps I can figure out what it actually does. This brings us to my example: the utility though consisting of several files contained no documentation about what happens where and the code was almost completely stripped out of comments. If I hadn't developed on my own a shell script watching out for the battery in /proc (yes, I know, scripts are bad choice for such things - that's why I ever got to download that module), I could never figure out where to look for what (in fact, I just grepped the .c files for "/proc/acpi/battery/" to find out where it read the info from the system) and I should just mail the poor guy saying "hey buddy, your prog doesn't work at all!". Writing docmentation and commenting the code is not only for the users' sake: it can help you get precise bug reports and even fixes.
PS.: coding, as in Computer Science, as in Polytechnic, as in Engineering. (a Medicine student speaking)

 

#

Re:doc is essential

Posted by: Anonymous Coward on April 05, 2002 09:30 AM
> PS.: coding, as in Computer Science, as in Polytechnic, as in Engineering. (a Medicine student speaking)

Thats where your problem lies. You do not talk to computer. You manage it, build it whatever. You can't grok software so you expect people explain it to you. Most programmers are poor explainers. Especially when during code session. An external doc is good, that explains software's hardware access points, so ambiguous port writes and other things can be explained away. Code itself, loops whatever should remain intact, with no foreign language (human) spread around. Perhaps Shakespeare should've explain how carrige works in his plays eh? =) How about him explaining each one of the line he wrote and what he meant? Romeo and juliet will be twenty times the size and most will not ever read it again. Perhaps I am smart, perhaps I am not, but I enjoy reading other people's code. Comments are like bugs in the road, they confuse me, get caught in my teeth. Computers should explain the code, if it comes to that. It took me about a week to learn 150000 lines of code once I ran it through doxygen.
Documenting is for computer cavemen, and pigs.
Write your code well, use computer powertools, learn from others, keep mind open. Make computer do the boring work.
I will not work for you if you are not amazed at what doxygen does.

#

Excess

Posted by: Anonymous Coward on April 03, 2002 10:26 AM
I think that the excess of documentation is as bad (or even worse) as the lack of documentation. In the article, it's the former. The documentation example in the article is like a novel - a boring novel.

#

Re:Excess--I disagree

Posted by: Anonymous Coward on April 03, 2002 11:41 AM
It's better to have something and not need it, than to need it and not have it. I'll take "too much" commenting over "not enough" or "none at all" any day.

#

Goto statement

Posted by: Anonymous Coward on April 03, 2002 12:38 PM
eww

#

Where to begin...

Posted by: Anonymous Coward on April 03, 2002 05:09 PM
Let's get a real-world scenario. I have a team of 6 developers getting paid $50/hour on a 100kloc application. If one leaves, then I hire another programmer to take his/her place. If there is no (or poor) code documentation, it takes the developer that much longer to familiarize themselves with specific method implementations, etc... Meanwhile I realize this as a substantial loss ($50/hour). With detailed code comments I am flattening out the learning curve and saving myself those extra dollars.

Documenting your code is not about helping those who aren't as smart as you, it is about lowering the total cost of development.

#

Literate programming

Posted by: Serge Wroclawski on April 03, 2002 08:48 PM
A programmer who wants well documented code should use a literate programming system such as WEB, CWEB, NoWEB or Nuweb.

- Serge Wroclawski

#

Difficult finding middle ground

Posted by: Anonymous Coward on April 03, 2002 10:16 PM
I agree that the example is over documented, and I often find that worse than under documentation. The approach I take is rather a middle ground like a few others have mentioned here already:

Every function or method, no matter how small, should include the date it was written, the author and a brief description of the function's purpose, along with a change log.

Only document within the function if it is difficult to understand from the code exactly what it is doing.

My biggest gripe though is that it is not always easy to determine the flow of execution of a program, especially if it is really large with many modules. I am not a great fan of flow charts but I do believe that if some documentation explaining at least the core functionality, data layout and purpose of each module of the program and dependencies is produced it goes a long way to assisting other programmers to understand the code quicker.

#

Re:Difficult finding middle ground

Posted by: Anonymous Coward on April 11, 2002 01:24 AM
I disagree: maintaing a changelog for each function is stupid.

That's what a Software Configuration Management system is for. CVS does this automatically for you.

Wim

#

Function comments

Posted by: Anonymous Coward on April 04, 2002 02:39 AM
Function Foobar does this, that and that, checking bar, foo and verifying that the blah is not foo. It calls function barf00 to actually perform this.
This function is called from blah, barg, brogoo, and anywhere that needs to foobar.
The blah parameters must be this and that, and the foo parameter must not be bloro.
Returns blah on success, -1 otherwise.

#

one problem not related to commenting

Posted by: Anonymous Coward on April 04, 2002 04:52 AM
strcmp() is not part of the standard template library, hence std::strcmp(blah.c_str(), "quit") would be incorrect. You just have to say strcmp(blah.c_str(), "quit")

#

Re:one problem not related to commenting

Posted by: Anonymous Coward on April 04, 2002 05:06 AM
actaully, technically following ANSI C++, yes it is. Unless you've dumped the namespace std (ie using namespace std) you need to refer to ANY included library function with std::.

#

Re:one problem not related to commenting

Posted by: Anonymous Coward on April 05, 2002 03:11 AM
This is wrong. If you use the C header string.h it is correct that you should leave out std::, but the example program includes the C++ header cstring, which - in accordance with the C++ standard - places strcmp in std::.

Björn

#

Way too much

Posted by: Anonymous Coward on April 04, 2002 05:15 AM
What makes you think that having that many lines of comments makes the program any easier to read? Infact, I'd argue that because there's SO much extra stuff this program is actually harder to read, the content is blurred.

Anybody remember the addage "don't say in a page what you can say in a sentence"?

I hope no programmer worth their salt takes these *recommendations* seriously, let alone OSS programmers!!

#

yeah -- too much

Posted by: Anonymous Coward on April 04, 2002 11:50 PM
I thought I did copious commenting -- yikes!

I start all my files (classes, code, scripts) with 2 lines:
# By: Me Month Year
# Purpose: 1 sentence frag that says it all

(If there's not a purpose for writing the code -- why write it?)

So, I like the idea of a 1 sentence high-level description of long functions. Maintaining code that has NO comments totally sucks! Writing code with no comments is unprofessional! Even if you are "too good for comments", have pity on use mere mortals.

#

USe long names instead of comments

Posted by: Anonymous Coward on April 05, 2002 02:59 AM
I've been working for over two years in server-side Java programming and learned a lot from my boss (and that's after about 15 years of programming).

One of the best points I learned from him is to avoid comments altogether. Instead we use long names for everything - variables, classes, methods, members, configuration parameters (xml file tags) etc...

Comments are not maintainable and will easely get out of sync with the actual code. When you code "top-to-bottom" you cut down the task at hand to stages, give a very meaningful name to each stage by calling a function and write that function, further sub-dividing the code so each function is not longer than "baby-shiber" as we call it (not longer than one screen at a reasonable resolution).

Today's compilers are smart enough to optimize this code, and the maintenability means that you can take advantage of old code far more than if it was documented with out-of-date comments.

BTW - same goes for our C++ brotherns (the other team which writes the main part of the product).

#

Read "Code Complete"

Posted by: Anonymous Coward on April 11, 2002 01:29 AM
Go out and buy or borrow "Code Complete" by McConnell. It's got much better examples then the poor commenting example provided in this article!

If the code requires comments, then it must be bad code and should be rewritten. The code should be able to live for itself.

#

Re:Read "Code Complete"

Posted by: Anonymous Coward on May 23, 2002 05:22 PM
You second sentence shows you haven't read the book you recommend very carefully!

McConnel spends a whole chapter on commenting style and ends it by saying

'Good code is its own best documentation. As you are about to add a comment ask yourself "How can I improve the code so that this comment isn't needed?" Improve the code and then document it to make it even clearer.'

It pretty clear that he thinks even good code is better commented, as he says comments can / should expalin the why rather than the how.

#

This story has been archived. Comments can no longer be posted.



 
Tableless layout Validate XHTML 1.0 Strict Validate CSS Powered by Xaraya