August 29, 2007

Evolution beta is a powerful personal data mining tool

Author: Joe Barr

Roelof Temmingh has written a cool new application which provides individuals with the ability to do data mining of publicly available information. It's a cross-platform Java application called Evolution, currently in its second beta, and available as a free download.

UPDATE: Reolof Temmingh has removed the software from the website saying in an announcement "This is due to circumstances outside of my control. I am not sure how long this outage will last, but perhaps it will be permanent...

H. D. Moore of Metasploit fame raved about Evolution during his packed-to-the-walls presentation at Defcon XV in Las Vegas. It's an impressive piece of work, even if it is still beta, and sports not exactly the most intuitive GUI I've ever seen.

If you're not familiar with the term "data mining," Wikipedia says it "has been defined as 'the nontrivial extraction of implicit, previously unknown, and potentially useful information from data' and 'the science of extracting useful information from large data sets or databases.'"

Still don't grok it? Think of the NSA sifting through network traffic, looking for actionable intelligence. Or if that's too conspiracy-minded for your taste, think of trying to find something new and meaningful in the results of a Google search on Paris Hilton. Evolution is kind of like that, but more aggressive in finding results, and a lot more aggressive in trying to make sense of them.

118798-1-thumb.png

You can experiment with Evolution using either a classic or a wizard-assisted Web interface.

If you want to run the latest version of Evolution on your own desktop, first make sure you have Java 1.5 or later installed, then download the tarball and decompress it. Enter the paterva_ic3visualizer subdirectory created by tar command and edit the configuration file evolution.conf found in the bin subdirectory. Replace the user names and passwords for the social networking sites that you have accounts at, uncomment them, and remove the entries for sites where you don't have an account.

You can start the program from the bin directory by entering ./paterva_ic3visualizer. A few seconds later, you should see an empty Evolution window appear. If you're running Mozilla -- you do remember the Mozilla browser, don't you? -- you're all set to begin your adventure, but if you're using Firefox, Netscape, or the Swing HTML browser you'll need to set the appropriate browser option the first time you run Evolution by clicking Options on the toolbar, and then clicking on System -> System Settings. From there, you can pick your poison from the drop-down menu for Web Browser. No other browser choices are available in the beta, so if you are using Opera or Konqueror, and you can't figure out how to make them work by hacking /etc/alternatives, you're just out of luck.

118798-2-thumb.png

The Evolution GUI displays a toolbar across the top. Directly below it is a slider-widget to limit the maximum number of results, and a memory/garbage status display, which you can click at any time to force garbage collection. Beneath those two items is a large empty pane labeled Evolution Graph, which will hold the mining results. Along the right side is a column of three smaller panes: Graph Navigator, Palette, and Evolution Detail View.

The Graph Navigator gives you a thumbnail view of everything in the Graph. If that is more than can be displayed at once, a slightly darker shade of gray in the Graph Navigator is used to display the visible portion. You can navigate the Graph by dragging the darker gray window up and down or left and right in the Graph Navigator.

The Palette holds a number of Evolution entities. Some are infrastructure-related (domain, IP address, DNS name, and Web site) and some are personal entities (email address, person, location, phrase, phone number, and affiliation).

If you position the cursor over the Palette tab in the right-most column, the Palette menu will appear where the Graph Navigator had been displayed.

The Evolution Detail view shows you details about whatever entity currently has the focus in the Graph pane. Often, it will contain jumping-off points for your browser to link you to external sites for additional related information.

A trial run

118798-3-thumb.png

To see how Evolution works, let's see what we can learn about a person. To begin, click and drag the Person entity from the Palette and drop it on the Graph.

It seems only fair to use Temmingh as the subject of our exercise. You can do that by double-clicking on "Name,Surname" in the Person entity box that appeared in the Graph after we dragged it there from the Palette, then typing Roelof,Temmingh -- with the comma and without any spaces -- and then pressing Enter.

Once we have a target we need to decide the type of information we want to learn about him.

Notice how the Palette pane has been reduced to a tab once again and that the Graph Navigation pane now holds a miniature map -- including the newly created Person entity -- of the Graph. Move the cursor over the Person entity in the Graph and the Evolution Detail View on the right becomes populated with information about the Person. Right-click on the Person entity, and a menu offering 24 different mining operations, called transforms, appears.

The lazy thing to do is to take the 25th option, which selects all the transforms. Let's do that and see what happens. By the way, I changed the limit for maximum number of results from its default of 5 to 10 using the slider-widget. That setting affects both the length of time it takes for the transforms to be performed and the amount of data returned.

Not much happens for about 45 seconds. A progress bar appears along the right side at the bottom of the GUI and the name of the transform executing displays next to it as Evolution moves from one transform to the next. When it's finished, Evolution populates the Graph pane with a couple of dozen new entities. If that's more than can fit in the pane on your system, a slider bar appears along the bottom of the GUI which allows you to scroll the pane horizontally to view the missing bits. As you do so, the Graph Navigator pane on the right shows you what portion of the Graph is being displayed, and what is not.

If you move the cursor over the left and top-most of the entities created by the mining -- it's for DNS name www.guildmusic.com in my results -- the first thing you'll see is a line pointing from the target Person entity to the DNS Name entity. The Evolution Detail View now displays information about the DNS Name entity. Right-click on the DNS Name, and Evolution presents three additional transforms you might want to check. Select the Website option, and a Website entity appears. Put the cursor on it, and the Detail View pane reveals its properties: the URL, a thumbnail of its front page, and the server type and platform. Move the cursor over the thumbnail image, and Evolution will start an instance of the browser specified in Options, opened to that site's URL.

The lines linking the entities provide a visual reminder of how each entity was created, and come and go as you move the cursor between the various entities. For example, the original line between Person and DNS Name disappears while you're working with the Website entity, and a new line between it and the DNS Name appears.

Here's a tip for viewing information in the Detail View. If you're trying to scroll down the Detail View pane in order to read additional information, but the pane empties when you move the cursor from the Graph to the Detail View, click once on the entity in the graph. The line to the entity, and the Detail View of that entity, will remain until you click somewhere else.

In addition to the DNS Name entities included in the results for Temmingh, there are also affiliations (ZoomInfo and GoogleBooks), email addresses, phone numbers, and other Person entities. One of them is Jeff Moss, founder of Black Hat and Defcon, and another is Tiian Van Aardt. Right-clicking on either of those two individuals brings up the same 25 transform options that we began our exercise with. I selected All again for Van Aardt, and after another half minute or so, I was rewarded with a whole new crop of entities in the Graph.

118798-4-thumb.png

After just a couple of minutes, and a couple of clicks, I had already learned enough about Temmingh to piece together a picture of him and his associates. For one thing, he appears to come from a musical family. One relative with the same name is a well-known South African composer. For another, he is an acquaintance of both Jeff Moss and Tiian Van Aardt. Those names sound like they belong to a rock 'n' roll band, so we can probably conclude that they are all muscians, perhaps even members of the same band!

I'm joking, of course. But we probably don't even want to think about government bureacrats making equally inane calls based on much more sophisticated tools using all available data, pubic, private, and classified.

To clear the GUI of your current data mining so you can start a new exploration, click Edit -> Select All on the tool bar, and then either press the Del key or click Edit -> Delete.

One of the new features in the beta 2 release is the incorporation of the newly popular Wiki Scanner, which allows you to see who has been changing entries on Wikipedia. Here's a quick peek at what you can do with it.

After starting Evolution as before, select DNS Name from the Palette and drag it to the Graph. Then double-click on the default name shown and replace it with microsoft.com. Now right-click on the entity and select the IP Address transform. That provides you with two IP addresses for microsoft.com. Right-click on the first one and select the Net Block transform, then right-click on the resulting Net Block and select Wiki Edits. You might want to slide the Maximum Number of Results all the way to the right before you do.

118798-5-thumb.png

The resulting entities show that someone from a Microsoft IP address has edited the Wikipedia entries for -- among other things -- linkage between Microsoft and SCO, the Chaos Computer Club, and Einstein's views on capitalism. Click on any of the results that interest you, then move the cursor to the Detail View pane. From there, you can go directly to a page in your browser showing the exact edits performed and the date they were made, or to a list of other edits made from the same IP address.

That should be enough to whet your appetite for more, especially if the idea of intelligence gathering -- whether for business, government, or personal reasons -- without breaking the law, and derived completely from public data, interests you. See the tips Temmingh has written for more things you can do with the current beta.

Conclusions

H. D. Moore was right -- this is a kick-ass application, just seething with power and potential. I will be following its development, and I suspect that many others will do the same, including a number of TLAs.

The addition of new transforms in the second beta, especially the one for Wiki Edit, proves the Evolution framework is mature enough to make transform additions into pluggable add-ons. It's scary to think how powerful this tool might become.

I asked Temmingh if he knew yet how he would license, sell, or distribute Evolution when it's finished. He said that he needs to make some money from Evolution or it will die. He is considering everything from advertising to subscriptions, or selling the GUI and transforms, or selling only the GUI and making the transforms open source, and he is open to other suggestions.

If he decides to sell the GUI, he is undecided on the price, saying only that "it needs to make sense for me to do it. While I love working on this, eventually we all need to eat."

Categories:

  • Privacy
  • Internet & WWW
Click Here!