The Kernel Newbie Corner: Kernel Debugging with proc “Sequence” Files–Part 3

1368

Finally, after twoinstallments of the basics of debugging with sequence files, we’re going to finish off by demonstrating the power of sequence files with an example that actually, well, sequences its output. And I’m going to admit up front that a bit of what follows is going to be speculation since much of the documentation on sequence files is either incomplete, inconsistent or, sadly, wrong. But we’ll muddle through, you’ll get to test everything, and the comments section will be open for folks to make suggestions and provide corrections and further enhancements. Additional coverage on sequence files can be found in Chapter 4 here.

(The archive of all previous “Kernel Newbie Corner” articles can be found here.)

This is ongoing content from the Linux Foundation training program. If you want more content, please consider signing up for one of these classes.

So What Are We About to Do?

Assuming you’ve come to grips with the previous two columns during which we showed how to use the sequence file implementation of proc files to do some basic module debugging, recall that the fundamental limitation of normal proc files is that they’re limited to displaying only up to a single page of output, a page being of kernel size PAGE_SIZE. The simpler forms of sequence files you saw in the early columns, while they were terrifically convenient, still have exactly the same limitation, which is why we’re going to cover this last variation, which allows you to beat that limit by defining an “iterator” for the output of your sequence file, and looping with that iterator to print as much output as your heart desires, one “item” at a time.

This is a common solution when you want to print, say, the contents of an entire kernel array of structures, or perhaps the entries in a linked list, where the full output would wildly exceed a single kernel page. If, however, you use a sequence file, you can define each output operation to print only a single data object, then just define your iterator to keep looping and printing, one at a time, until all your objects have been printed, the only size limitation being that the output when each of those items is printed can’t exceed a single page in size per item.

There is one thing to keep in mind–all of this iteration and looping is completely invisible to the user, who will still simply invoke a single command to list the proc file and watch all that output come streaming by with no idea that all that iterating is happening underneath.

The seq_file Routines

For each seq_file you want to create that supports actual iteration, you need to define the following four routines (listed in the kernel header file include/linux/seq_file.h:

 struct seq_operations {
void * (*start) (struct seq_file *m, loff_t *pos);
int (*show) (struct seq_file *m, void *v);
void * (*next) (struct seq_file *m, void *v, loff_t *pos);
void (*stop) (struct seq_file *m, void *v);
};

And what do each of these routines represent? Let’s discuss them one at a time.

First, when you attempt to list the contents of the sequence file, the “start” routine that you define is automatically invoked to initialize the iteration loop. By default, the first time the start routine is called, the offset position passed to that routine is zero, which allows you to initialize pointers, allocate objects, or whatever it is you need to do to prepare to start printing. But that’s not all.

If successful, the start routine is responsible for returning the address of the first “object” of many to be printed. This could be the first element in an array, the first object in a linked list, or whatever. But whatever it represents, that pointer must be set properly as it’s the value that will be used for printing later.

Next, the purpose of your “show” routine should be self-evident–given a pointer to some object, the show routine is responsible for, well, “printing” it, using any of the sequence file output routines discussed in earlier columns. The only limitation here is that a single “show” operation can’t print more than one page of output. And here’s where the fun starts.

Your “next” routine is responsible for bumping up the iteration information of your sequence file in not just one but two ways. That routine must increment your position offset (typically by one), and it must also increment the void pointer to refer to the next data object to be printed. It might seem redundant to have two iterators keeping track of your progress, but you will need them both, for reasons that will be clear later.

In addition, your “next” routine is responsible for checking if you’re finally out of objects to print, at which point it should return a NULL pointer to signify that there’s nothing left to print.

Finally, the “stop” routine is invoked after all objects have been printed, and it’s the responsibility of that routine to release any resources that were allocated initially by your “start” routine, and so on. All of this will become more obvious once we start working our way through the example. But there’s one ugly detail we need to cover first.

What If I Want to Print LOTS of Data?

And here’s a property of sequence files that seems to get little attention, so let’s deal with it now. As I mentioned earlier, the major benefit of this “sequencing” of output is to beat the kernel page limit, so as long as any single output operation from your “show” routine is less than a page, you’re fine. But that’s not the whole story.

It turns out that, as you keep printing one object after another, your total output (one printed item after another) is still being checked against that page limit and, if you’re about to exceed that limit, that’s when things get exciting.

If your next “show” operation would cause the total amount of output printed thus far to exceed a page, the sequence file terminates printing, calls your “stop” routine, cleans up and then immediately restarts printing by re-invoking your “start” routine with the offset position of where things left off, which means that you need to design all of your routines with the understanding that this stopping and re-starting has to be absolutely seamless and invisible to the user. You’ll see what I mean shortly once we get into the example.

(Note: This feature–the potential stopping and restarting in the middle of printing–is why so many beginning programmers, when testing their first sequence file, complain that something must be going wrong as their “stop” routine seems to be called more than once. They don’t realize that that is, in fact, normal behaviour, depending on the amount of output.)

So What’s the Example Going to Do?

To demonstrate all of this, let’s use a totally contrived example that does nothing but print even numbers, one number per “show” operation, as if this was a useful thing to do. In addition, let’s throw in some dynamic memory allocation and allocate a single integer which will store the current value of the even number, even though that’s a bit of overkill.

In addition, let’s add a module parameter that lets us specify how far we want to print. And, on top of all that, let’s throw in massive amounts of kernel debugging information to the /var/log/messages file so you can follow what happens as you enter and exit each routine.

The Example

Let’s call our module source file evens.c, and away we go with header files and our module parameter:

 #include <linux/moduleparam.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/proc_fs.h>
#include <linux/fs.h>
#include <linux/seq_file.h>
#include <linux/slab.h> // for kmalloc()

static int limit = 10;
module_param(limit, int, S_IRUGO);

Next, let’s declare a pointer to the integer we’re going to dynamically allocate to store the current even number. As I said, this is a bit of a contrived example and you’d normally be “printing” more interesting objects, but this is enough to demonstrate the basic principles:

static int* even_ptr; 

And now, one at a time, let’s define our four essential routines.

The “start” Routine

 static void *
ct_seq_start(struct seq_file *s, loff_t *pos)
{
printk(KERN_INFO "Entering start(), pos = %Ld.n", *pos);

if ((*pos) >= limit) { // are we done?
printk(KERN_INFO "Apparently, we're done.n");
return NULL;
}

// Allocate an integer to hold our increasing even value.

even_ptr = kmalloc(sizeof(int), GFP_KERNEL);

if (!even_ptr) // fatal kernel allocation error
return NULL;

printk(KERN_INFO "In start(), even_ptr = %pX.n", even_ptr);
*even_ptr = (*pos) * 2;
return even_ptr;
}

So what can we say about the above? Plenty:

  • This routine, and all of the others, will be printing all sorts of debugging information so you can keep up with what happens at each step.
  • While we expect the first invocation of the start routine to begin with an offset position of zero, we need to allow for the fact that (as we mentioned above), this represents a restart somewhere in the middle of our sequence, so under no circumstances should you count on that value being zero, and you should always compare it to the terminating value to see if it’s time to stop.
  • If we just entered this routine, it’s our responsibility to allocate whatever we might need (even if this is a restart). This suggests that, when we call the “stop” routine, it’s our job to release all acquired resources, even it it means that we need to reacquire them all over again on a restart.
  • Just for fun, print the address of the dynamically-allocated integer, because if we have to free and reacquire that space, it’s easy to see that you might not get the same space the next time.
  • Finally, based on whatever the value of that position offset is, you need to initialize your even number to the appropriate value. If this is a restart, it will be initialized to the value that represents where you need to pick up where you left off.

So far, so good? Moving on, then.

The “show” Routine

This routine is fairly straightforward–given the address of the “thing” you want to print, cast the pointer to the correct type, dereference it, then print it, based on whatever definition of “printing” you’ve decided on:

 static int
ct_seq_show(struct seq_file *s, void *v)
{
printk(KERN_INFO "In show(), even = %d.n", *((int*)v));
seq_printf(s, "The current value of the even number is %dn",
*((int*)v));
return 0;
}

All of that looks fairly reasonable, except for the really verbose output message that represents the output of the sequence file. Normally, you’d simply print the value and move on. Instead, we were deliberately wordy simply because I want to print so much output that I do exceed that page limit of output at some point, so you can see what happens when that occurs.

The “next” Routine

This routine represents what has to be done to move on to the next object to print:

 static void *
ct_seq_next(struct seq_file *s, void *v, loff_t *pos)
{
int* val_ptr;

printk(KERN_INFO "In next(), v = %pX, pos = %Ld.n", v, *pos);

(*pos)++; // increase my position counter
if ((*pos) >= limit) // are we done?
return NULL;

val_ptr = (int *) v; // address of current even value
(*val_ptr) += 2; // increase it by two

return v;
}

Things to notice about the above:

  • A major job of this routine is to detect when you have no data left to print, and return NULL when that happens.
  • If you’re not done, you have to bump up both the “offset” and the corresponding object pointer value, so that the next call to the “show” routine gets the correct value. Quite simply, it’s your job to keep those two values in sync with one another.

The “stop” Routine

Finally, the job of your stop routine is to do any necessary cleanup and release of system resources that might have been allocated in your start routine, but make sure you keep the following in mind.

This might not be the end of all printing. Instead, as we explained above, your stop routine might have been invoked simply because you were about to exceed that page limit, at which point your sequence file is “stopped,” then restarted with the current offset so that you can pick up where you left off.

In order to drive this home, we’ve added all sorts of debugging information so you can see clearly what your stop routine is doing:

 static void
ct_seq_stop(struct seq_file *s, void *v)
{
printk(KERN_INFO "Entering stop().n");

if (v) {
printk(KERN_INFO "v is %pX.n", v);
} else {
printk(KERN_INFO "v is null.n");
}

printk(KERN_INFO "In stop(), even_ptr = %pX.n", even_ptr);

if (even_ptr) {
printk(KERN_INFO "Freeing and clearing even_ptr.n");
kfree(even_ptr);
      even_ptr = NULL;
} else {
printk(KERN_INFO "even_ptr is already null.n");
}
}

The reason for all that output will become obvious shortly.

The Rest of the Module Source

Add this remaining code to your source file, and you’re ready to go:

 static struct seq_operations ct_seq_ops = {
.start = ct_seq_start,
.next = ct_seq_next,
.stop = ct_seq_stop,
.show = ct_seq_show
};

static int ct_open(struct inode *inode, struct file *file)
{
return seq_open(file, &ct_seq_ops);
};

static struct file_operations ct_file_ops = {
.owner = THIS_MODULE,
.open = ct_open,
.read = seq_read,
.llseek = seq_lseek,
.release = seq_release
};

static int
ct_init(void)
{
struct proc_dir_entry *entry;

entry = create_proc_entry("evens", 0, NULL);

if (entry)
entry->proc_fops = &ct_file_ops;

return 0;
}

static void
ct_exit(void)
{
remove_proc_entry("evens", NULL);
}

module_init(ct_init);
module_exit(ct_exit);

MODULE_LICENSE("GPL");

Your First Test Run

If you accept the default limit of just the first 10 even numbers, then build the module and load it, after which you should see a brand new file named /proc/evens. List that file to see the first 10 even numbers (starting with zero):

 $ cat /proc/evens
The current value of the even number is 0
The current value of the even number is 2
The current value of the even number is 4
The current value of the even number is 6
The current value of the even number is 8
The current value of the even number is 10
The current value of the even number is 12
The current value of the even number is 14
The current value of the even number is 16
The current value of the even number is 18

That seemed to work fine, but it’s the debugging output that went to /var/log/messages that’s interesting and a bit puzzling:

 Sep  6 15:11:34 localhost kernel: Entering start(), pos = 0.
Sep 6 15:11:34 localhost kernel: In start(), even_ptr = ffff88010b8bc6b8.
Sep 6 15:11:34 localhost kernel: In show(), even = 0.
Sep 6 15:11:34 localhost kernel: In next(), v = ffff88010b8bc6b8, pos = 0.
Sep 6 15:11:34 localhost kernel: In show(), even = 2.
Sep 6 15:11:34 localhost kernel: In next(), v = ffff88010b8bc6b8, pos = 1.
Sep 6 15:11:34 localhost kernel: In show(), even = 4.
... snip ...
Sep 6 15:11:34 localhost kernel: In next(), v = ffff88010b8bc6b8, pos = 8.
Sep 6 15:11:34 localhost kernel: In show(), even = 18.
Sep 6 15:11:34 localhost kernel: In next(), v = ffff88010b8bc6b8, pos = 9.
Sep 6 15:11:34 localhost kernel: Entering stop().
Sep 6 15:11:34 localhost kernel: v is null.
Sep 6 15:11:34 localhost kernel: In stop(), even_ptr = ffff88010b8bc6b8.
Sep 6 15:11:34 localhost kernel: Freeing and clearing even_ptr.
... hang on, why are we starting over here? ...
Sep 6 15:11:34 localhost kernel: Entering start(), pos = 10.
Sep 6 15:11:34 localhost kernel: Apparently, we're done.
Sep 6 15:11:34 localhost kernel: Entering stop().
Sep 6 15:11:34 localhost kernel: v is null.
Sep 6 15:11:34 localhost kernel: In stop(), even_ptr = (null).
Sep 6 15:11:34 localhost kernel: even_ptr is already null.

So what do we notice about the above, one step at a time:

  • Unsurprisingly, the first invocation of your start routine is passed an initial offset of zero.
  • The address of the dynamically-allocated integer to store my current even value is, on this 64-bit system, ffff88010b8bc6b8, which I’ll need to deallocate later when I’m done.
  • I can see the position offset and the even value increasing correspondingly, until the “next” routine finally notices that we’ve reached the end, at which point the “stop” routine is called.
  • The stop routine checks that the variable even_ptr still has a non-zero value, at which point that storage is freed and the pointer is set to NULL for a reason that will become clear shortly.

And this is where things get a bit strange since, if you look closely at the above debugging output, even though we’ve printed all 10 of our desired even numbers and are officially finished, the implementation of sequence files still calls the “start” routine a second time with the position offset of 10, and it’s the job of the start routine to notice that there’s nothing left to do and immediately return NULL.

Not only that, but it should be obvious that because you’re invoking the start routine a second time, you will be invoking your stop routine a second time as well, so you have to be careful that you don’t try to free that same dynamically-allocated space again. It may be that this is the expected behaviour, but I have to admit I was a bit surprised to see this happening, so make sure you take this into account and be very careful with any kernel-space dynamic allocation.

Note: If you don’t need to do any dynamic allocation, and that pointer represents simply the address of each existing data object in turn, then the above won’t have any effect. But the above was simply to warn you about the subtle errors that might creep into your sequence files if you don’t know about this.

And What If There’s LOTS of Output?

Finally, let’s see what happens if you unload that module, then reload it with a much larger limit of, say, 1000:

 # insmod evens.ko limit=1000 

This is what you might see in /var/log/messages as you list the contents of that proc file:

 ... things start off just as before ...
Sep 6 15:25:10 localhost kernel: Entering start(), pos = 0.
Sep 6 15:25:10 localhost kernel: In start(), even_ptr = ffff88010b8bc6d0.
Sep 6 15:25:10 localhost kernel: In show(), even = 0.
Sep 6 15:25:10 localhost kernel: In next(), v = ffff88010b8bc6d0, pos = 0.
Sep 6 15:25:10 localhost kernel: In show(), even = 2.
Sep 6 15:25:10 localhost kernel: In next(), v = ffff88010b8bc6d0, pos = 1.
Sep 6 15:25:10 localhost kernel: In show(), even = 4.
... much later, about to hit a kernel page limit,
so we stop printing and invoke the stop() routine ...
Sep 6 15:25:10 localhost kernel: In show(), even = 188.
Sep 6 15:25:10 localhost kernel: Entering stop().
Sep 6 15:25:10 localhost kernel: v is ffff88010b8bc6d0.
Sep 6 15:25:10 localhost kernel: In stop(), even_ptr = ffff88010b8bc6d0.
Sep 6 15:25:10 localhost kernel: Freeing and clearing even_ptr.
... only to invoke start() again, and pick up where we left off ...
Sep 6 15:25:10 localhost kernel: Entering start(), pos = 94.
Sep 6 15:25:10 localhost kernel: In start(), even_ptr = ffff88010b8bc6d0.
Sep 6 15:25:10 localhost kernel: In show(), even = 188.
Sep 6 15:25:10 localhost kernel: In next(), v = ffff88010b8bc6d0, pos = 94.
... much more output snipped ...

Notice what happened there? Apparently, after printing a certain amount of output, your sequence file realized that it was about to exceed the kernel page limit in terms of total output, at which point it called things off, put a stop to printing and invoked your “stop” routine to shut everything down. After which, of course, it turned around, re-invoked your “start” routine with the position offset where it left off and picked things up again.

It just so happens that the next memory allocation call grabbed the same location that had just been freed, but there’s no guarantee of that–it could have been a different address entirely.

And, as it did before, even when your sequence file hit the specified limit and was truly done printing all 1000 values, it insisted on restarting only to notice that, yes, this time it was really finished:

 Sep  6 15:25:10 localhost kernel: In show(), even = 1996.
Sep 6 15:25:10 localhost kernel: In next(), v = ffff88010b8bc6d0, pos = 998.
Sep 6 15:25:10 localhost kernel: In show(), even = 1998.
Sep 6 15:25:10 localhost kernel: In next(), v = ffff88010b8bc6d0, pos = 999.
... we're really and truly done now ...
Sep 6 15:25:10 localhost kernel: Entering stop().
Sep 6 15:25:10 localhost kernel: v is null.
Sep 6 15:25:10 localhost kernel: In stop(), even_ptr = ffff88010b8bc6d0.
Sep 6 15:25:10 localhost kernel: Freeing and clearing even_ptr.
... once again, restart only to notice nothing left to do ...
Sep 6 15:25:10 localhost kernel: Entering start(), pos = 1000.
Sep 6 15:25:10 localhost kernel: Apparently, we're done.
Sep 6 15:25:10 localhost kernel: Entering stop().
Sep 6 15:25:10 localhost kernel: v is null.
Sep 6 15:25:10 localhost kernel: In stop(), even_ptr = (null).
Sep 6 15:25:10 localhost kernel: even_ptr is already null.

Is It Always That Complicated?

Not necessarily. Note that I deliberately made this example more complicated than it had to be by dynamically allocating a single data pointer for no good reason, at which point I had to look after it constantly.

It may be that your sequence file is printing the entries in an array, and that the position offset very quickly gives you the address of the appropriate array element, so that there’s no need for dynamic allocation anywhere. That’s not uncommon, and it would obviously simplify your code. In cases like that, where the object pointer can be calculated directly from the position offset value, your start and stop code should be trivial.

Is There More?

Sure, why not. While we’re here, let’s beat this to death. First, notice that you’re effectively responsible for keeping track of two “iterators”–the position offset (which starts off at zero and typically is incremented by one each time, identifying which object you’re printing), and the “void *” object pointer, which is used to keep track of each object that you print.

While it might seem odd to need to handle two iterators, notice that you absolutely must keep them synchronized and up to date. The object pointer is necessary for the “show” routine to know which object to print, while the position offset value is critically important in case your sequence file needs to stop and restart in the midst of printing, since that’s the only value that tells your start routine where to resume. Without that value, your sequence file simply won’t work properly.

Another subtle but important point involves the “void *” pointer that is passed to your stop routine. If you examine both examples above, you’ll notice that while that value seems to persist over stops and restarts, when your stop routine is called for the final time, that value appears to always be NULL. That means that you cannot use that pointer to store the result of a dynamic allocation since, by the time you get to your final call to your stop routine, you’ve lost that value. (One suspects that that “feature” is the reason a number of sequence files don’t work properly at termination time.)

An additional point involves something you saw above–that even after your “next” routine detected that you were done and invoked your “stop” routine, you still ended up calling your “start” routine one more time, which immediately noticed that you were done, which then immediately invoked your “stop” routine yet again. This is a common source of sequence file errors, when someone deallocates some storage in their “stop” routine but doesn’t clear the pointer, then tries to deallocate that very same storage a second time. That’s why we insisted on setting the even_ptr pointer to NULL after the deallocation in your “stop” routine–to ensure that we didn’t try to erroneously free it a second time.

And here’s one final bit of information you might have some fun with. Note that most of your routines above accept, as a first argument, the pointer to the sequence file structure in question:

 struct seq_operations {
void * (*start) (struct seq_file *m, loff_t *pos);
int (*show) (struct seq_file *m, void *v);
void * (*next) (struct seq_file *m, void *v, loff_t *pos);
void (*stop) (struct seq_file *m, void *v);
};

Not surprisingly, we ignore that struct seq_file* pointer for the most part, but if you’re interested in seeing what’s going on behind the scenes, you can examine the definition of that structure in the kernel header file include/linux/seq_file.h:

 struct seq_file {
char *buf;
size_t size;
size_t from;
size_t count;
loff_t index;
loff_t read_pos;
u64 version;
struct mutex lock;
const struct seq_operations *op;
void *private;
};

Note how, if you’re interested, you can examine various fields in that structure at any time, such as the size of the buffer and how full it is at any time. There’s even a private pointer of type “void *” you’re welcome to use if you want to store some data of that type — perhaps a reasonable place to keep track of some dynamically-allocated data. If you’re feeling ambitious, extend the example shown above by printing even more debugging information as each routine is called.

Some Real-Life Examples

If you want to examine some actual sequence files in the current kernel source tree, take a look at some of the source files under fs/proc. As one example, peruse the source in the file devices.c, which is the implementation of the /proc/devices file:

 $ cat /proc/devices
Character devices:
1 mem [0, 256]
4 /dev/vc/0 [0, 1]
4 tty [1, 63]
4 ttyS [64, 32]
5 /dev/tty [0, 1]
... etc etc ...

What’s interesting about that example is that, if you examine the “start” and “next” routines for that sequence file:

 static void *devinfo_start(struct seq_file *f, loff_t *pos)
{
if (*pos < (BLKDEV_MAJOR_HASH_SIZE + CHRDEV_MAJOR_HASH_SIZE))
return pos;
return NULL;
}

static void *devinfo_next(struct seq_file *f, void *v, loff_t *pos)
{
(*pos)++;
if (*pos >= (BLKDEV_MAJOR_HASH_SIZE + CHRDEV_MAJOR_HASH_SIZE))
return NULL;
return pos;
}

you’ll notice that the position offset value is being used simultaneously as the object address pointer value being returned. All this represents is an implementation that is so trivial and straightforward that that same value can do double-duty–not at all uncommon depending on what you’re trying to do.

So Is All of This Correct?

That’s a good question. As I admitted up-front, I’m fairly sure I’ve got the semantics right, but I’m willing to be corrected, so test the above and leave observations in the comments section. And check back a few times since I might tweak this piece based on any feedback.