Linux.com

Re:awk - still fastest for simple stuff

Posted by: Administrator on January 26, 2006 12:44 PM
I am presently working as a software testing consultant for a large financial services company. We use Perl for quite a bit of our financial analysis ad hoc tools, and even more of our production tools are written in Perl, though our main applications are written in C++ or Java. Nevertheless, when it comes to a quick extraction of a few fields from a large data feed with well defined fields, I have found Awk to be a useful tool, even though some other responders feel that Python, Ruby, or Perl are more modern tools.

One example of where I have used Awk to quickly produce results is when I want to get a list of securities from one of the market exchanges. Several of our vendors provide online Web sites, where we can download complete lists of information about securities. From those lists, I want to get just the market symbols and the exchanges where they are traded. From there, I want to create a four part subject in various internal formats.

Here's a modified version of the kind of stuff I do, modified to protect the identity of the institution's information:

awk -f symbolExtract.awk ExchangeList > fourPartSubject

where symbolExtract.awk is my short Awk script and ExchangeList is the text file extracted from the vendor's Web site.

Here's what such a script looks like:

FS=|
print {"vendor.record.$1.$3"}

That's it. The script can go through twenty or thirty thousand records within a second or two and create a nicely formatted subject file, which I can then easily change fields one, two, or four to run through various different data feeds. I can change the fields with Awk, too, or I can change them in a Vi editor, the Sed stream editor, Emacs, or any other convenient tool. Speed, ease of change, and flexibility are all there, each of which are VERY IMPORTANT in our fast moving business.

#

Return to CLI Magic: Learn to talk awk