July 2, 2009

Generating Graphs with gnuplot, Part 2

Article Source Linux Developer Network
July 2, 2009, 8:57 am

In Part 1 of this series on Gnuplot, there was a lot of talk about line colors, font settings, export scripts, and the like, but no actual graphs. Input from graphs can be read from the command line or, more commonly from external files. These files are expected to be columns of data with some explicit separator character delimiting the columns. By default the delimiter is a space but can be set to a comma or a pipe (perhaps for SQLite data).

Lets take the space separated file, data1.dat shown below as an example data file.

$ cat data1.dat
1 3 5 8 2
2 4 6 7 2
3 5 6 6 2
4 6 2 5 4
6 10 1 4 8
10 8 4 4 8
11 7 8 5 2
12 6 8 6 8

The plot command is used to load a data file and generate a graph with the contents or part there of. The plot command can be quite simple to use, but can also be quite complex if you want to use all of its options. The first thing to specify in the plot command is where to read the data from, this is a file path surrounded by double quites. Then which columns to use from the file are specified with the using clause, in the below plot I put the first column of data as the x-axis and the second column as the y-axis.

set style data lines
plot "data1.dat" using 1:2

The 1:2 is a column specification, in this case I want the X value to be the first column and Y the second column. The leftmost column in the data file is number 1. Column number 0 can be thought of as the line number. You can also perform arithmetic in a column specification by enclosing the expression in parenthesis. For example, the plot shown below will make the Y value the data from the second column in the data file multiplied by 0.1. This way you can also combine multiple columns to show a ratio on the graph. Be aware that you must include a decimal point if you want your calculation not to be truncated to using whole integer maths.

plot "data1.dat" using 1:($2 * 0.1)  

A screenshot of the wxt terminal window that appears after the first plot command is issued is shown below. Notice that gnuplot has chosen an appropriate range for the X and Y axis.

Using the wxt terminal you can zoom in on a region of the graph by right clicking on one corner and right clicking again to define the area you want to zoom in on. The apply autoscale button in the toolbar will take you back to the default scaling again. The toolbar buttons display hints to tell you what they do when you hover the mouse over them. Alternately the "u" key will also unzoom you. Middle clicking on the graph will place a small cross hair with its coordinates as a label at the location you clicked. Hitting the "r" key will turn on a cross hair extending off the graph in all directions from the current mouse pointer location. Hitting "r" again will turn off the cross hair. The "l" key will toggle a logarithmic scale on the Y-axis. Hitting the "L" key will toggle logarithmic scale on the axis closest to the mouse pointer.


When you set many options or issue certain commands, the graph window might not be automatically redrawn for you. If you are uncertain, issue the replot command to force a redraw.

Since there are five columns of data in the data1.dat file, you might like to add in some of the other columns. The file name and using clause can be repeated many times in the same plot. To use the same data file that the last using clause was using simply supply a blank string as the path to the data file as shown below. This generates the screen shot shown below the plot command.

plot "data1.dat" using 1:2, "" using 1:3, "" using 1:4 


Using line segments to join data points explicitly shows the values of each data point on the graph, but can look quite busy. The smooth option to the plot command can be used to draw bezier or cspline curves using the data points instead of rigid line segments. The bezier smoothing looses more data precision than the cspline fitting. You can also use the acsplines smoothing which lets you supply a third value telling gnuplot how much weight each point has. The cspline will make relatively greater efforts to cross through points with greater weight. In the below example, I have generated a third column of data using the parenthesis. I made the second line (red) plotted use line segments so you can see the data points relative to how the acsplines are rendered. The third line (purple) has a fixed weight of 200 for each data point and the fourth (golden) only a 1/5 weight for all points. Notice on the graph that the cspline with 1/5 weights just shows a slump whereas the cspline with weight 200 (purple) goes through all the data points.

plot "data1.dat" using 1:2 smooth bezier,
"" using 1:3,
"" using 1:3:(200.) smooth acsplines,
"" using 1:3:(1/5.) smooth acsplines


If you want to generate a bar chart from the data instead of a line graph use the histograms style as shown below. Notice that I have used shortcuts when specifying the using and title clauses for the second dataset. Since you have to specify the using and title in that order, you can abbreviate them to just the first letter. The title clause sets how a dataset is described in the legend of the graph, notice that the red graph is called foobar in the legend at top right of the screenshot below.

set style data histograms
plot newhistogram "", "data1.dat" using 2, "" u 3 t "foobar"


There are a handful of other commands which you should really know before you start exporting your graphs for public use. Notice that the above bar chart has x and y axis labels and a graph title. The below commands were used to set these properties before the plot command. If you already have a plot, you can issue the below followed by the replot to redraw the graph.

set xlabel "Sample"
set ylabel "Population"
set title "Specimen Survival Rates 2009"

And for bar charts, the below command can be used to fill in the bars.

set style fill solid 1.000000 noborder 

In Part 3, I'll start out using SQLite as the relational database and graph data from there before moving on to using PostgreSQL and pivot tables to transform relational data into column format for gnuplot.