To see what JPGraph can do, let's look at the executable binary files in the /usr/bin directory. I'll exclude the symbolic links. I'll also omit the over 130 files in the /usr/bin/X11 sub-directory. My purpose isn't to be comprehensive; just to show what JPGraph can do. Specifically, I'll be using JPGraph to look at three basic questions:
Is there a relation between a binary executable's file size
and the number of shared libraries that it uses? I used ls -l to get file sizes. I also
used ldd [filename] to count number of shared libraries used.
When was the last time each binary executable was accessed? To get the last access times,
I used the command stat -c %X [filename]
How many files use the shared library files? I used the ldd command
to get shared library listings and then counted up all the hits I got on each shared library.
I printed the data for the first two questions to plain text files. For the third, I used a MySQL database for more flexibility.
While looking at the graphs that result, I'll also comment on some of the formatting features offered by JPGraph. I'll jump around a bit, but seeing the features in action shows their usefulness better than talking about them separately.
Graph 1: Is there any relation between file size and the number of shared libraries used?
Figure 1 shows the results of running the ldd
command on the /usr/bin directory. I've also used this graph to
showcase some of the features JPGraph offers.
But you'll notice that there are green, red and black circles, all in slightly different sizes. What's going on there? Well, JPGraph lets you do function callbacks in which you can alter the color and sizes of your plot points according to the Y-axis value, or, in this case, the Y2-axis value. The green circles represent files that use between 10 and 20 shared libraries. Black circles represent 0-9 links to shared libraries, and red circles over 20.
I could have simply used the same color (and size) for all the Y2-axis data points, but then the results wouldn't be so obvious. This way, you can immediately see that the green circled are heavily outnumbered by the red circles. In turn, the red circles are heavily outnumbered by the black circles.
Any circles in the pink cross-hatched area share 10-20 libraries. As well, because of the way I defined the callback function, any circles lying in the cross-hatched area are going to be green. Circles lying above the cross-hatched area represent executables that use more than 20 shared libraries -- although, of course, they don't all use the same ones.
Notice the Y-axis and how it uses a logarithmic scale. That was necessary because our filesizes range from less than 100 bytes all the way up to somewhere between one and 10 megabytes. One megabyte is 10 to the sixth power. JPGraph uses 10^6 to represent 1,000,000 because it is easier to read.
So, after gathering the data and figuring out how to present it, what did I learn? The first thing I learned is that there are about 1,253 files in my /usr/bin subdirectory -- excluding the /X11 sub-directory -- which are not symbolic links. It turns out that around 450-500 of these files are not dynamic executable files, but, presumably, text scripts that call other executable files. These files are represented by the black circles on the Y2-axis zero value line.
Perhaps I should have excluded such files from consideration. How, they do not affect the main idea that I get from looking at the graph. Although files linked to over 20 share libraries (the red circles) are slighly more numerous for the first 600 files than they are for the next 600, the pattern is not nearly consistent enough for us to say that smaller files are generally linked to fewer libraries. However, I can conclude that the majority of the binary executable files lie under the pin cross-hatched area, which means that they use less than ten share libraries.
Before moving on, note that the graph in Figure 1 also showcases many of JPGraph's features, including:
Using True Type fonts. I used one named Bazooka for my X- and Y-axis titles.
Shading under the line graph from one x value to another x value. I shaded under the graph in a subdued yellow color to highlight the files that lie between 10^4 bytes and 10^5 bytes in size.
Shading an entire vertical strip from one x value to another x value. I shaded in very light brown for all the files that lied between 10^3 and 10^4 bytes and had the computer figure out where to start and stop.
Using a gradient color scheme for the margins, while leaving the plot area in a solid color. The blend colors I used were red and black, but you can specify other colors.
Using Alpha blending to specify a transparency percentage between zero and one. Typical values that I might use are .5 or so. In Figure 1, you can see how I used it to allow the circles to show even though the areas had vertical fills in two different sections. If the vertical fills simply covered the circles up, that would defeat the purpose of the graph.
Graph 2: When were binary executables last accessed?
To answer these questions, I decided that a scatter plot would help us see when files were last accessed. I also decide to check file sizes, since a multi-megabyte file that hasn't been accessed in two years might be more of a candidate for deletion than one that only uses 100 kilobytes. To plot this information, what was needed was two Y-axes, one for the last access of each file (in Unix timestamp format, seconds since the epoch) and one for its size in bytes. To enhance to the graph, I added the Tux logo after tweaking it slightly in the GIMP.
How many such files are there in that last cluster? Our Y2-axis was designed to answer that question. The blue triangles with the number above them shows that 293 files are represented by all the red squares stacked between Dec 15, 2002 and Feb 11, 2003.
More generally, one can see that the cumulative file count -- the orange area -- grows relatively slowly until the very far end of the graph at January 6, 2005. The graph shows that 477 files were last accessed before November 9, 2004. The remainder of the 1,253 files were accessed after that time.
What conclusions can be drawn from this graph? What strikes me is how long those vertical strips of squares are. It doesn't appear to make much of a difference what the filesize is (the y axis value). All files have the same distribution of last access times. I am not exactly sure why no files have a last access time earlier than Dec 15, 2002 but that may have been when I installed the system.
This particular graph allowed me to experiment with function
callbacks for formatting text labels on the X-axis. When doing this
graph, both the y and y2 axes were required to have the same x values
so that the plots could be overlayed. However, I didn't want the
dates in the form December 13, 2002 because JPGraph
couldn't figure out how to order them by time. I had to use the Unix
timestamp as the time value, and then use a callback function to
reformat them into a human-understandable formatted date.
Figure 2 also allowed a few other niceties, such as:
Using the text feature of JPGraph to place text at an
arbitrary location on the graph while specifying color and
transparency (I used a transparency setting of 0.4 to allow any red
squares to show through. Cumulative File Count is the
text that I placed in white using a custom true type font.)
Printing only the lines on the major divisions of the Y-axis and making them dotted lines. This formatting was useful to maintain a semblance of the Y-value at each power of 10 since the graph was logarithmic on the Y-scale
Using the tab title feature to display the title of the graph. The text color, background color and frame color can all be set -- I used magenta, black and green.
Graph 3: How many files use the shared library files?
This graph highlights more of JPGraph's abilities. The previous graphs were 640 by 480 but for this graph, I needed more vertical space so I opted to make it 480 by 740. Even so, I had to confine myself to using only the top 50 shared libraries.
libm.so.6 with 414 dependencies, libdl.so.2 with 250, and
then the rest.
I designed the graph so that it would be readable and understandable without the need for X- and Y-axis titles. I chose a rotated type graph, a horizontal bar graph. The blended bars going from green to blue make the consecutive bars stand out from one another. I decided to put the value inside each bar, instead of to the side of it, because, the farther away you get from the top, the less you know what exact X-value you are at. I almost tilted the text names of the shared libraries at a five degree angle, but I decided that hurt readability slightly. The SetLabelAngle method takes one argument, the number of degrees, positive or negative.
If you look at the graph in Figure 3, you'll notice that, because the bars are decreasing in size, there's empty space on the right side. Rather than leave it blank, I placed a legend there. Red on a yellow background is what the JPGraph documentation and examples use and I saw no reason to change that.
One last thing: For the other two graphs, I used simple text files for my data storage. For this one, I used MySQL. If I wanted to change to graph more than the top 50 libraries, all I would have needed to do is change my query. That's the type of power and flexibility that MySQL can provide when working with JPGraph. With text files, varying the display would be much harder.
Conclusion
Of course, JPGraph doesn't have everything. For example, I would like the ability to do three-dimensional graphs. A three-dimensional bar graph can show relationships that might be impossible to observe in two-dimensional graphs, such as last access time versus filesize versus number of shared libraries. Another feature that needs enhancing is the callback function. In Figure 1, I used the callback function to set the color and size of the filled circles. The problem is that I was only able to use one number to determine both the size and the color. It would be nice if those could be determined by other arrays. Similarly, I would like to be able to use different colors on the individual bars of a single bar plot instead of having to use multiple bar plots. Overall, I am not disappointed with JPGraph's function, but a few coupld offer more fine-grain control.
You may not be interested in filesystems and how many shared libraries your system has, but you don't need my interests to appreciate JPGraph. No matter what your data, you might like to take advantage of JPGraph to perform your own data visualization. And with the comprehensive documentation and the hundreds of samples graphs that you can modify with your own data (located in http://localhost/jpgraph/src/Examples), JPGraph can have you up and running in no time.
Download JPGraph and see for yourself.
Glenn Mullikin is a professional Linux journalist.
Note: Comments are owned by the poster. We are not responsible for their content.
Chartjunk
Posted by: Anonymous Coward on March 11, 2005 03:09 AM[1] http://www.edwardtufte.com/tufte/books_vdqi
#