July 17, 2007

Parsing arguments for your shell script

Author: Carl Albing, JP Vossen, and Cameron Newham

Suppose you want to have some options on your bash shell script, some flags that you can use to alter its behavior. You could do the parsing directly, using ${#} to tell you how many arguments have been supplied, and testing ${1:0:1} to test the first character of the first argument to see if it is a minus sign. You would need some if/then or case logic to identify which option it is and whether it takes an argument. What if the user doesn't supply a required argument? What if the user calls your script with two options combined (e.g., -ab)? Will you also parse for that? The need to parse options for a shell script is a common situation. Lots of scripts have options. Isn't there a more standard way to do this?

This article is excerpted from the newly published book bash Cookbook.

The solution -- use bash's built-in getopts command to help parse options. Here is an example, based largely on the example in the manpage for getopts

	#!/usr/bin/env bash
	# cookbook filename: getopts_example
	#
	# using getopts
	#
	aflag=
	bflag=
	while getopts 'ab:' OPTION
	do
	  case $OPTION in
	  a)	aflag=1
			;;
	  b)	bflag=1
			bval="$OPTARG"
			;;
	  ?)	printf "Usage: %s: [-a] [-b value] args\n" $(basename $0) >&2
			exit 2
			;;
	  esac
	done
	shift $(($OPTIND - 1))

	if [ "$aflag" ]
	then
	  printf "Option -a specified\n"
	fi
	if [ "$bflag" ]
	then
	  printf 'Option -b "%s" specified\n' "$bval"
	fi
	printf "Remaining arguments are: %s\n" "$*"

There are two kinds of options supported here. The first and simpler kind is an option that stands alone. It typically represents a flag to modify a command's behavior. An example of this sort of option is the -l option on the ls command. The second kind of option requires an argument. An example of this is the mysql command's -u option, which requires that a username be supplied, as in mysql -u sysadmin. Let's look at how getopts supports the parsing of both kinds.

The use of getopts has two arguments.

	getopts 'ab:' OPTION

The first is a list of option letters. The second is the name of a shell variable. In our example, we are defining -a and -b as the only two valid options, so the first argument in getopts has just those two letters -- and a colon. What does the colon signify? It indicates that -b needs an argument, just like -uusername or -ffilename might be used. The colon needs to be adjacent to any option letter taking an argument. For example, if only -a took an argument we would need to write 'a:b' instead.

The getopts built-in will set the variable named in the second argument to the value that it finds when it parses the shell script's argument list ($1, $2, etc.). If it finds an argument with a leading minus sign, it will treat that as an option argument and put the letter into the given variable ($OPTION in our example). Then it returns true (i.e., 0) so that the while loop will process the option then continue to parse options by repeated calls to getopts until it runs out of arguments (or encounters a double minus -- to allow users to put an explicit end to the options). Then getopts returns false (i.e., non-zero) and the while loop ends.

Inside the loop, when the parsing has found an option letter for processing, we use a case statement on the variable $OPTION to set flags or otherwise take action when the option is encountered. For options that take arguments, that argument is placed in the shell variable $OPTARG (a fixed name not related to our use of $OPTION as our variable). We need to save that value by assigning it to another variable because as the parsing continues to loop, the variable $OPTARG will be reset on each call to getopts.

The third case of our case statement is a question mark, a shell pattern that matches any single character. When getopts finds an option that is not in the set of expected options ('ab:' in our example) then it will return a literal question mark in the variable ($OPTION in our example). So we could have made our case statement read \?) or '?') for an exact match, but the ? as a pattern match of any single character provides a convenient default for our case statement. It will match a literal question mark as well as matching any other single character.

In the usage message that we print, we have made two changes from the example script in the manpage. First, we use $(basename $0) to give the name of the script without all the extra pathnames that may have been part of how it was invoked. Secondly, we redirect this message to standard error (>&2) because that is really where such messages belong. All of the error messages from getopts that occur when an unknown option or missing argument is encountered are always written to standard error. We add our usage message to that chorus.

When the while loop terminates, we see the next line to be executed is:

	shift $(($OPTIND - 1))

which is a shift statement used to move the positional parameters of the shell script from $1, $2, etc. down a given number of positions (tossing the lower ones). The variable $OPTIND is an index into the arguments that getopts uses to keep track of where it is when it parses. Once we are done parsing, we can toss all the options that we've processed by doing this shift statement. For example, if we had this command line:

	myscript -a -b alt plow harvest reap

then after parsing for options, $OPTIND would be set to 4. By doing a shift of three ($OPTIND-1) we would get rid of the options and then a quick echo$* would give this:

	plow harvest reap

So, the remaining (non-option) arguments are ready for use in your script (in a for loop perhaps). In our example script, the last line is a printf showing all the remaining arguments.

Categories:

  • Programming
  • Desktop Software
Click Here!