Put Your Web Hits on the Map with libferris and XQuery

Article Source Linux Developer Network
May 27, 2009, 6:49 am

In this article, you will learn how to take the IP addresses and other information from an apache access.log file and see those IP addresses as place marks with Google Earth and Google Maps. We’ll use XQuery to do this instead of Perl or your favorite scripting language. We won’t have to worry about breaking up the lines in access.log or opening connections to relational databases to lookup the location of IP addresses in a table — with the recent advancements in libferris both access.log and relational databases are both already at your fingertips from an XQuery.

Because this article touches on many technologies, I’ll omit an introduction to XQuery. You should be able to get the gist of the XQuery code shown at the end of the article, which I’ll also describe. But the details of XQuery are left for you to explore in other tutorials.

Because we want to deal with geographic data, specifically IP addresses placed at locations and cities, using KML format XML files makes sense. KML files can be loaded and saved by Google Earth and Google maps, making them a good format for showing graphically on maps where IP addresses are. This is another reason that XQuery works well, you can easily generate an XML document directly from XQuery.

The libferris project started about 10 years ago with the goal of creating a powerful virtual filesystem. Along the way it has gained the ability to mount XML, relational databases, Berkeley db and other ISAM files, and anything else that makes sense. While having everything available as a virtual filesystem is nice, at times you might prefer to use a different interface than a filesystem. So you can now access any libferris filesystem as a virtual XML Document Object Model (DOM), an XQilla document, an SQLite virtual table, or through XSLT.

The traditional “mounting” filesystem operation is mostly handled for you by libferris. If you access a directory and libferris knows what to do to mount it in a meaningful way then it does so. For example, listing postgresql://localhost/mydb will list all of the tables in the mydb database on localhost. No mounting is needed, just grab the data you want. This works well when using libferris with XQuery because you can just peek into any libferris filesystem directly.

I’ll talk about how to map an IP address to city and a latitude, longitude pair first, then turn attention to getting at the data contained in apache’s access.log files from libferris, and finally turn to XQuery to bring it all together.

IP to Location

To resolve an IP address to the city, country and digital longitude and latitude values I’ll use the free IP address geolocation SQL database. The free IP address database download is targeted to a MySQL database. I used the mysql2pgsql tool to convert the SQL file and import into a PostgreSQL database. The main reasons for this are that I prefer PostgreSQL and libferris has better support for mounting PostgreSQL as a filesystem than other relational database servers. Since I am the author of libferris, these two reasons are not unrelated. Of particular interest for this article, you can call a PostgreSQL database function through the libferris filesystem interface which is not currently implemented for other databases.

The IP address database uses the MySQL inet_aton() function. A replacement for PostgreSQL can be found here. The final touch is to wrap up the IP to geographic information into a custom PostgreSQL function as shown below. First the return type of the function is defined, cc is the country code, reg is the region, ipn is the numerical version of the dotted IP address and ip is the dotted IP address. The first select statement illustrates this, is the dotted IP address and 1163556352 would be the numerical version (ipn).

The ipgeo function takes a dotted IP address and returns the location information if available as shown in the final example query.

select inet_aton('') as ipn

create type ipgeoresult as
( cc varchar,
reg varchar,
city varchar,
lat double precision,
long double precision,
ipn bigint,
ip text );

CREATE OR REPLACE FUNCTION ipgeo( ip varchar ) RETURNS ipgeoresult AS '
select country_code as cc,
region_code as reg,
city,latitude as lat,longitude as long,
ip_start as ipn,inet_ntoa(ip_start) as ip
from ip_group_city
where ip_start <= inet_aton($1)
order by ip_start desc
limit 1;

select city,lat,long from ipgeo('');
city | lat | long
Edmonton | 53.55 | -113.5

 Read more at the Linux Developer Network