Home Blog Page 1903

X.Org XDC2013 Announced For X, Wayland, Mesa

The X.Org Foundation has finally announced the details concerning the 2013 X.Org Developers’ Conference…

Read more at Phoronix

How to Use Awk to Find and Sort Text in Linux, GnuCash

awk is a splendid Unix scripting language for processing text files. The version included in most Linux distros is GNU awk, or gawk for short. I like it for pulling data from ordered data sets, such as text lists and CSV exports from spreadsheets. awk sees each line in a file as a separate record, and each item in a line as a separate field, which makes it possible to slice and dice your files in all kinds of flexible ways. The classic way to illustrate this is with /etc/passwd; this example prints the whole contents:

$ awk '{ print $0 }' /etc/passwd 
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh

print with no options means “print the whole line”, and $0 mean “whole line”, so omitting $0 gives the same result. Now suppose you want just a list of usernames and UIDs:/etc/passwd is perfect for parsing with awk, because its data fields are delimited with colons. So all you do is count from the left starting with 1 to number the fields you want, and then extract the usernames and UIDs like this:

$ awk -F":" '{ print $1 " " $3 }' /etc/passwd 
root 0
daemon 1
bin 2
sys 3

-F defines your field separator, and " " inserts a space. What if you want the UIDs listed first? Easy peasey, just move the variables around like this:

$ awk -F":" '{ print $3 " " $1 }' /etc/passwd

Now let’s look at a real-life example of using awk to migrate a big wad of financial data into GnuCash.

Importing Data into GnuCash

GnuCash is one of the crown jewels of FOSS, a brilliant and powerful accounting program that runs on Linux, Mac, and Windows. You can import QIF and OFX files, but there is only partial support for CSV imports. Which is a shame, because CSV (comma-separated values) makes a good universal format for moving financial data from one program to another, and many financial applications support CSV import and export.

Let’s say you have this giant spreadsheet containing years of financial data, and you decide it’s time to put it in a proper accounting program. Or you want to move from a different accounting program into GnuCash, and it doesn’t support QIF or OFX, but it does support CSV export. You could retype all your data, or you could put your ace Linux scripting skills to work. The workflow is to first create a good clean CSV file, convert that to QIF, and then import the QIF into GnuCash. GnuCash is very picky about having a perfectly clean QIF file with no errors, so we’ll make sure to have that.

Make sure you work from a copy of your source file! Don’t muck up your original!

I’ll keep this example simple and use only the following QIF fields:

  • D – date
  • P – payee
  • M – memo
  • T – amount
  • N – check number, or any notation in the check number field
  • L – category, which corresponds to GnuCash accounts
  • ^ – end of record

We’ll also need to specify the account type in the QIF file header, like these examples:

!Type:Bank  
!Type:Cash  
!Type:CCard  
!Type:Invst

The QIF specification supports a lot more items, and you can see the full details here. If this page ever disappears just do a search for “qif spec” as it is abundantly documented.

Spreadsheets are fab for making mass changes, like date and number formats, so take a look at your CSV in a spreadsheet before trying to convert it to QIF. Make sure that your withdrawals have a minus sign, like -33.72, and don’t use dollar signs. Deposits can have a plus sign if you prefer, but it’s not required. All of your withdrawals and deposits need to be in a single column. This is what my final trimmed-down CSV export looks like:

11/03/2008  Copy Junction  Copy of building codes	-33.72	8732  Supplies	
11/03/2008  Home Depot	Trowel	-17.05	8734  Tools	
11/03/2008  Dewalt Service Center	Charger for Drill  -75.85  8735	Tools	
11/04/2008  Building Supply  Margin trowel  -13.23  8736  Tools	
11/05/2008  Jane Smith   invoice #5843	8,500.00	   dep   income:contracting

If there is even a single error anywhere in the QIF file the GnuCash import will fail. Some errors I’ve run into are multiple minus signs, like –33.72, extra decimal points, and incorrect date formatting. awk won’t care but GnuCash will. Alrighty then, let’s convert our nice CSV into a QIF file:

$ ( echo ‘!Type:Bank’; cat exportfile.csv | awk -F $’t’ ‘{ print “D” $1; print “P” $2; print “M” $3; print “T” $4; print “N” $5; print “L” $6; print “^”; }’ ) > importfile.qif

And the result looks like this, with withdrawals indicated by minus signs and deposits with unsigned values:

!Type:Bank
D03/25/2008
PJane Smith
M invoice #4657
T4000.00
Ndep
Lincome:contracting
^
D04/02/2008
PFirst Bank of Money
MCheck Order
T-21.44
NACH
Lbank fees
^
D05/15/2008
PPretty Designs
MDesign Services
T-500.00
N8922
LContract Services
^

Note how to specify a tab field delimiter by using the normal ASCII escape sequence for horizontal tabs, t. If all goes well you’ll have a good error-free QIF file to import into GnuCash. You can import this into a blank new GnuCash book, or into an existing book, and you’ll map the QIF categories to GnuCash accounts.

Find Blocks of Text

awk is wonderful for finding blocks of text, and you might give it a try when grep isn’t picking up exactly what you want. For example, you can snag the complete lspci output for a single device:

$ lspci -v | awk '/VGA/,/^$/'
01:00.0 VGA compatible controller: NVIDIA Corporation G98 [GeForce 8400 GS] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: Micro-Star International Co., Ltd. Device 1162
        Flags: bus master, fast devsel, latency 0, IRQ 18
        Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
        Memory at d0000000 (64-bit, prefetchable) [size=256M]
        Memory at fa000000 (64-bit, non-prefetchable) [size=32M]
        I/O ports at dc00 [size=128]
        [virtual] Expansion ROM at fe9e0000 [disabled] [size=128K]
        Capabilities: 
        Kernel driver in use: nvidia
        Kernel modules: nvidia_current, nouveau, nvidiafb

The caret, ^, is a regular expression anchor that matches the start of a string, and $ matches the end, so in this example /^$/ looks for the line breaks at the beginning and end of the text block. This is a great trick for extracting specific blocks from large configuration files with spaces between sections, like this example from sshd_config :

$ awk '/X11Forwarding/, /^$/' /etc/ssh/sshd_config 
X11Forwarding yes
X11DisplayOffset 10
PrintMotd no
PrintLastLog yes
TCPKeepAlive yes
#UseLogin no

Remove Duplicates Without Sorting

The sort and uniq commands are the ones we usually turn to for finding and removing duplicate entries in a file. But maybe you don’t want your source file sorted or changed, so this is where trusty awk comes to the rescue by extracting the unique records and storing them in a new file:

$ awk '!x[$0]++' filewithdupes > newfile

Your original file won’t be changed, and the new file will have only unique entries all nicely in order.

man awk contains complete documentation of options, and to get the most out of awk (or any Unix/Linux command) you need a good grasp of regular expressions, and for this I recommend the wonderful book Mastering Regular Expressions. If you want to pick the most useful thing of all to learn, learn regular expressions, because most programming languages and Linux/Unix commands rely on regular expressions.

Samsung to Produce New iPad Screens: Report

Despite legal woes, Samsung will supply Apple with retina display screens for the new iPad range.

FLOSS Manuals Offers New, Useful Guides to Cool FOSS Applications

As a matter of course here at OStatic, we’re committed to compiling documentation and guidance resources for popular open source platforms and applications. After all, one of the most common critcisms of open source creations is the lack of official project documentation. One of the best ongoing projects for producing free open source-related documentation is FLOSS Manuals. It’s an ongoing and ambitious effort to build online guides for open source software. Recently, the site has added useful documentation for some projects that may interest you, including Firefox and the interesting collaborative editor Etherpad.

 

 
Read more at Ostatic

Report: Red Hat to Switch Up MySQL for MariaDB

Back in 2010, when Oracle acquired Sun Microsystems, MySQL founder Monty Widenius (shown) was very vocal about the acquisition, and raised many questions about the future of the open MySQL database, which he founded. In fact, we did an interview with Monty here on OStatic, where he said, “It’s clear that Oracle is in the game for the profit and it’s in their interest to get out as much money from MySQL as they can over the long term.”

Monty went on to become the lead developer of MariaDB, a fork of MySQL, and now there are reports that Red Hat will switch the default database in its RHEL enterprise product, from MySQL to MariaDB, upon the release of version 7. That’s a big vote of confidence in MariaDB.

 

 
Read more at Ostatic

OpenSUSE 12.1 Reaches Its End of Life

The developers have announced that the version of the distribution that was released in November 2011 will no longer be maintained and that users should upgrade to the current openSUSE 12.3.

Read more at The H

Google’s Android Faces EU Probe Over Licensing Practices

European competition authorities have reportedly sent out a questionnaire to Google’s competitors in the mobile market, asking their opinions on its behaviors. [Read more]

 

Read more at CNET News

Acer Now Eying Wearable Tech Too

The company is just the latest looking at the burgeoning market, according to Pocket-Lint. [Read more]

 

Read more at CNET News

Coping with Loss (in Open Source)

Many software development teams use external components in their projects, libraries, or tools provided by commercial vendors or open source communities. However, as anyone who has ever had to scramble after a vendor has gone out of business can tell you, these external dependencies are not without risk. Software companies can fail, products can be discontinued and open source projects can stagnate. Components that aren’t being maintained or security risks that aren’t being addressed can put your project in a difficult situation.

 

Read more at Wazi

Development Release: openSUSE 13.1 Milestone 2

Andres Silva has announced the availability of the second milestone release of openSUSE 13.1: “For those of you waiting for (or working on) openSUSE 13.1, we have good news: milestone 2 is now out for you to download. As to be exptected, the inclusion of newer software versions….

Read more at DistroWatch