October 27, 2008

Teach an old shell new tricks with BashDiff

Author: Ben Martin

BashDiff is a patch for the bash shell that can do an amazing number of things. It extends existing bash features, brings a few of awk's tricks into the shell itself, exposes some common C functions to bash shell programming, adds an exception mechanism, provides features of functional programming such as list comprehension and the map function, lets you talk with GTK+2 and databases, and even adds a Web server right into the standard bash shell.

There are no packages of BashDiff in the openSUSE, Fedora, or Ubuntu repositories. I'll build from source using BashDiff 1.45 on an x86 Fedora 9 machine running bash 3.0. While versions 3.1 and 3.2 of bash are available, the 1.45 BashDiff patch does not apply cleanly to either.

You can build the BashDiff shell to use shared objects that are loaded at runtime or elect to have these objects linked directly into the shell itself. If you have plenty of RAM you might like to use the all-inclusive BashDiff build so you don't have to remember to load extensions before using them. I'll compile two binaries, bash and bash+william, corresponding to the two approaches.

Because you will probably want to keep the normal bash binary from your Linux distribution, it is a good idea to use a private prefix when configuring BashDiff. This way you can link to the binaries in /usr/local/bin and keep all of the BashDiff installation in a known place.

The commands to patch and compile bash with BashDiff are shown below. First you must expand the standard bash 3.0 tarball and extract the patch from the compressed BashDiff tarball. Once the patch is applied, follow the normal configure and make steps. Notice that I supplied a prefix to put all of the BashDiff shell files into. I used the install-bin target to install just the binary files for the bash+william binary. The last four commands install the william.so shared object that the non-william version uses to dynamically load functionality.

$ mkdir bashdiff-build
$ cd bashdiff-build
$ tar xzf ../bash-3.0.tar.gz
$ cd ./bash-*
$ tar xzvf .../bashdiff-1.45.tar.gz
$ patch -p2 < bashdiff-1.45.diff
$ autoconf
$ ./configure --prefix=/usr/local/bashdiff
$ make
$ make bash+william
$ sudo make install
$ sudo make install-bin

$ cd examples/loadables/william
$ make
$ sudo make install
$ sudo ldconfig

One last comment about compiling BashDiff: You should install the development packages for SQLite (2.x), MySQL, PostgreSQL, GTK+2, GDBM, and Expat if you wish your BashDiff build to compile support for these features.

BashDiff extends the for, while, and until commands to include optional then and else blocks following the main loop block. These new optional then/else blocks let you perform an action after a normal iteration of the block (the then block) and a block of commands when the break command is used (the else block). Using the else block is useful if you want to see if the loop was terminated early using the break command.

Bash 3 includes the sequence generation expression {1..100}, which expands to a space-separated sequence of the numbers between 1 and 100. In standard bash you cannot set these minimum and maximum values using shell variables but must fall back to using the seq command. BashDiff allows you to use variables, as the following example shows:

$ max=15
$ echo {1..max}
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

The case command is extended to allow the use of regular expressions as well as the standard glob expressions for the matching pattern. Submatch results are made available in the SUBMATCH array. BashDiff also incorporates a feature from Ksh and Zsh -- the ability to fall through from one case match to the rest by terminating the case statement with ;& instead of the normal ;;.

BashDiff adds exception handling semantics with the try and raise commands. A try block is terminated with done that can include a case block to detect and handle errors. If an error is not handled in the done block then it is propagated upward to the next higher try block, as you would expect from C++ or Java exceptions.

For C programmers, BashDiff exposes versions of common string-handling functions from the C language. When you are used to using strcpy, strcat, strcmp, and strlen, having them available in bash means one less mental context switch. The strstr, str[c]spn, and strtok functions are also exposed, although their names have been changed slightly; for example, strspn becomes the accept function in BashDiff.

Speaking of the C language, BashDiff includes a sscanf function, which lets you read and break apart data. As there is no differentiation between string and numeric variables in bash you can only scan for strings (or single characters), but the BashDiff sscanf function includes two special percent operators that allow you to consume all characters that either match or do not match a given character class. This makes it easier to break apart input that is separated by white space (the %s) or when you have numeric, alpha, or alphanumeric data that is delineated in some fixed way. Shown below is a simple URL parser that uses the sscanf command.

$ url=http://www.linux.com/foo
$ sscanf $url '%[a-zA-Z]://%[^\/]/%[a-zA-Z1-9]' a b c
$ declare -p a b c
declare -- a="http"
declare -- b="www.linux.com"
declare -- c="foo"

If your input parsing goes beyond what sscanf can give you, the match command can break apart a string using a regular expression and place the captured matching subexpressions into a bash array of your choice. In the below example I've added another directory to the end of the URL and now parse it using a regular expression. First I check for http or ftp URLs, then capture the domain name and first directory component. I use the array a for BashDiff to place the result of the match into. The first, second, and last elements in the array are special and contain the prefix of the string that does not match your regex, the entire part of the string that does match your regex, and finally the part of the end of the string that doesn't match your regex -- so you can see what was skipped, what matched, and what was left over after the match. The other array parameters store the matching subexpressions from your regular expression.

$ url=http://www.linux.com/foo/bar
$ match $url '((ht|f)tp):[/]*([^\/]+)/([^\/]+)' a
$ declare -p a
declare -a a='([0]="" [1]="http://www.linux.com/foo"
[2]="http" [3]="ht" [4]="www.linux.com" [5]="foo" [6]="/bar")'

From the functional programming side, BashDiff offers the arraymap command, which takes a command and one or more arrays as arguments. Your command is run on each element of the passed arrays. When two or more arrays are passed in your function gets an element of each array as positional parameters. For example, the adder function shown below is passed the first element of the array a and the array b the first time it is called, the second element of both of these arrays the second time, and so on.

$ a=(1 2 3) b=(4 5 6)
$ adder() { echo $(($1 + $2)); }
$ arraymap adder a b

Also from functional programming is list comprehension ${var|command}. In this case the command is run on the supplied var and the result is used to replace the expression during parameter expansion. A small collection of prefixes can be used to specify a useful predefined command to be used. For example, the below command uses the - prefix to split the string on a regular expression, returning non-matching blocks as array elements. The input has many spaces after the first "and" and a single tab after the "there" word. See the BashDiff documentation for full details of all the regex and glob splitting, case folding, and quote and white space handling shorthand modes.

$ in="and there was singing"
$ for z in ${in|-[ \\t]+}; do echo $z; done

The minus, plus, and plus-minus shorthand operators allow you to pick apart a passed variable with regular expressions. The minus version uses the regular expression as the delimiter and returns the things between the regex matches as the results. With the plus version you specify what you want to be returned with the regex. The plus-minus version uses the ñ character and returns both the matching and non-matching content in the results.

$ echo $url
$ echo ${url|-/}
http: www.linux.com foo bar

Tune in tomorrow when we'll take a look at modifying positional parameters, parsing XML, talking to ISAM and relational databases, creating GTK+2 GUIs, and a few other tricks and issues.


  • Shell & CLI
  • Tools & Utilities
Click Here!