January 2, 2008

Maltego mines the Internet without violating TOS

Author: Joe Barr

Not long after Linux.com reviewed Roelof Temmingh's powerful online data mining tool Paterva Evolution a few months ago, Temmingh was forced to remove the application from the Paterva Web site because of complaints that some of the methods he used to harvest data were violating the terms of service (TOS) of the services from which the information was gathered. Recently, Temmingh released a completely redesigned version of the tool -- now called Maltego -- and has made it available again as a free-as-in-beer download.

Like the original, Maltego is written in Java, and it still requires Java Development Kit (JDK) 1.5 or later to run. The GUI looks and behaves much as it did before, but almost everything is new and different under the hood.

Maltego uses numerous methods to search for public information about a variety of entities, such as individuals, phrases, email addresses, URLs, and domain names. These methods are referred to as transforms. In the original release, these transforms were coded as part of the application -- not they're not in Maltego. It has been redesigned so that the Maltego client you run on your machine utilizes a server -- called a Transform Application Server -- to collect and process the information found by the transforms, then returns the results to the client.

This design allows others to write transforms, set up their own Transform Application Servers, or even add their own entity types to conduct searches for virtually any type of information. Users can now modify or add transforms without needing to reinstall the software. The new architecture also allows users to restrict or control the use of transforms through an individual API, thus avoiding the type of complaints suffered by the original design for violating the TOS from search firms, social networks, and others.

Temmingh says that the new design is faster because the transforms do not run locally, "even when you're low on bandwidth. I've tested it with a slow GPRS connection, and it's still very usable."

On the down side, Maltego has less access to information than the original. Unlike Evolution, it can't use Google, because Google's TOS frowns on "scraping," as automated searches are called. Some may find this a bit odd, since Google itself scrapes the Internet to find and catalog the same information it now denies to others using similar tools.

Instead of Google, Maltego uses Yahoo! for its search engine transforms, but limits the number of times Yahoo! can be used during a 24-hour period. Most transforms involving the search of social networking sites have also had to be stripped out for similar concerns over violating their TOS. Two new transforms -- one for RapLeaf and the other for Spock -- try to plug that hole.

With all these changes, is Maltego still a powerful tool for doing your own data mining on the Internet? It proved to be so in my first usage. It took me about 15 minutes to learn about managing transforms so that I could make use of all 64 of them available; the project provides good user documentation on this subject. Once I figured it out, I immediately was able to find new information on a subject of current interest.

Temmingh says that Maltego will probably go commercial in a future version, perhaps by selling the GUI as commercial software, perhaps by selling custom transforms. The Maltego GUI is now based on a model similar to that of the Metasploit Project, and like that project, with its plugin exploits and payloads, the real power comes from the transforms and servers. As such, it might prove to be a lucrative offering for those with a hankering for customized and controlled intelligence gathering.


  • Internet & WWW
Click Here!