Posted by: Administrator
on January 26, 2006 12:44 PM
I am presently working as a software testing consultant for a large financial services company. We use Perl for quite a bit of our financial analysis ad hoc tools, and even more of our production tools are written in Perl, though our main applications are written in C++ or Java. Nevertheless, when it comes to a quick extraction of a few fields from a large data feed with well defined fields, I have found Awk to be a useful tool, even though some other responders feel that Python, Ruby, or Perl are more modern tools.
One example of where I have used Awk to quickly produce results is when I want to get a list of securities from one of the market exchanges. Several of our vendors provide online Web sites, where we can download complete lists of information about securities. From those lists, I want to get just the market symbols and the exchanges where they are traded. From there, I want to create a four part subject in various internal formats.
Here's a modified version of the kind of stuff I do, modified to protect the identity of the institution's information:
where symbolExtract.awk is my short Awk script and ExchangeList is the text file extracted from the vendor's Web site.
Here's what such a script looks like:
FS=| print {"vendor.record.$1.$3"}
That's it. The script can go through twenty or thirty thousand records within a second or two and create a nicely formatted subject file, which I can then easily change fields one, two, or four to run through various different data feeds. I can change the fields with Awk, too, or I can change them in a Vi editor, the Sed stream editor, Emacs, or any other convenient tool. Speed, ease of change, and flexibility are all there, each of which are VERY IMPORTANT in our fast moving business.
Re:awk - still fastest for simple stuff
Posted by: Administrator on January 26, 2006 12:44 PMOne example of where I have used Awk to quickly produce results is when I want to get a list of securities from one of the market exchanges. Several of our vendors provide online Web sites, where we can download complete lists of information about securities. From those lists, I want to get just the market symbols and the exchanges where they are traded. From there, I want to create a four part subject in various internal formats.
Here's a modified version of the kind of stuff I do, modified to protect the identity of the institution's information:
awk -f symbolExtract.awk ExchangeList > fourPartSubject
where symbolExtract.awk is my short Awk script and ExchangeList is the text file extracted from the vendor's Web site.
Here's what such a script looks like:
FS=|
print {"vendor.record.$1.$3"}
That's it. The script can go through twenty or thirty thousand records within a second or two and create a nicely formatted subject file, which I can then easily change fields one, two, or four to run through various different data feeds. I can change the fields with Awk, too, or I can change them in a Vi editor, the Sed stream editor, Emacs, or any other convenient tool. Speed, ease of change, and flexibility are all there, each of which are VERY IMPORTANT in our fast moving business.
#