Open Source Datamining for Social Media Accounts with ThinkUp


Proprietary social networking platforms have a few distinct issues for free software users, but one of the biggest is that it is often hard — if not impossible — to extract your information from them. With Twitter, for example, you can scroll down to the bottom of the page and wait for more tweets to load via JavaScript, but you can’t sort and analyze them yourself. But that’s exactly what the open source application ThinkUp does for you.

The code for ThinkUp was recently declared 1.0, and is stable for multi-user setups with multiple services.

Using the official APIs of various social media services, ThinkUp lets you import all of your own posts, and drill down through the relevant information about them — which proved popular with other users, what topics are the hottest, and so on. No hacking or terms-of-service violation is required. Plus, since it’s open source, you can find a number of plugins to extend the service to new platforms. ThinkUp is licensed under the GPLv3, and is written in PHP with a MySQL database back-end. All you need to get started is a standard LAMP server.

Installation and Setup

The ThinkUp code is hosted at GitHub, but if you are installing it for the first time, downloading the pre-packaged zip archive may be simpler. The 1.0 release is from November 15, and includes pre-packaged support for Google Plus, Twitter, and Facebook accounts, plus a GeoEncoding plugin that plots replies and discussion threads on a map. Those who are not interested in running their own server can also try the ThinkUp Launcher, which creates a ThinkUp instance on your Amazon EC2 cloud service account with virtually no setup required.

For everyone installing ThinkUp on an existing server, however, the procedure will seem quite familiar: unpack the archive into a location under the web root (ThinkUp does not need to be run in the root directory of the virtual server), create MySQL database credentials (database user, password, and empty database), then visit the installation URL from a browser. After you click through the painless installer script, you’ll be required to set up your first user account, which will have administrative privileges.

Before you can use ThinkUp, however, you will also need to configure each of the plugins. Click on the “Settings” button in the upper right-hand corner. The first tab on the settings page is for plugins — Facebook, Twitter, Google Plus, the GeoEncoder, and an URL expander (to improve tracking for those ubiquitous shortened URLs) are installed by default, although the GeoEncoder is deactivated at first installation.

Configuring ThinkUpActivating the plugins is a prerequisite, but to get them working, you need to click on each one’s title and follow the set-up instructions. In each case, the instructions show you step-by-step how to establish the authorized API key required for a new web application to connect to the social networking platform of choice. These API keys are specific to the installation — in other words, for your ThinkUp server must have its own Twitter API key in order to connect to Twitter, and its own Google Plus API key in order to connect to Google Plus, and so forth.

After you correctly set up API keys will all of the services that interest you, you still need to link your user account(s) to the plugins. That sounds like more steps than it really is — in truth, you only need to set up one API key per service, but your app can connect to as many Twitter accounts as you want. You can support multiple users for your project, family, or office, or if you maintain more than one personal account, you can connect to all of them yourself.

Finally, after the plugins are set up and the accounts linked, you need to tell ThinkUp to fetch and process your data. Click on the ThinkUp logo to return to the user dashboard page (the fact that the logo is the only link to the dashboard is one of my pet peeves), and click on the “Update now” button. The app will retrieve your posts and metadata (including replies, followers, retweets, and so on) through the API. If you have multiple accounts configured, you must switch between the services via the upper left-hand services menu to make sure you fetch data for all of them. The process of retrieving the data can be time-consuming, particularly at first, so do not close or navigate away from the ThinkUp window.

Drilling for Data

ThinkUp provides you with multiple mechanisms for digging in to your data-stream and extracting interesting view from it, although exactly what tools are available depend on the plugin used (and, ultimately, on the service’s API). Generally speaking, however, you get access to “hot” topics from your most recent posts, a bar-graph view of recent activity (comments and replies, retweets and re-shares), and the ability to select individual posts and explore how often they spawned discussions — and, with the GeoEncoder enabled, where.

You can also search through your content, access shared objects (including links and images) posted by your friends and followers, and sort through your older content for historical data on re-shares, feedback, and other items of interest. For platforms that support it, ThinkUp produces slick time-charts showing follower and list membership count.

You also have access to a variety of built-in “metrics” that attempt to suss out less obvious trends. For instance, the Twitter plugin breaks your tweets down into “conversationalist” and “broadcaster” categories, based on whether or not you include @-replies or links. This is supposedly the sort of data that the social-media-introspection site Klout bases its ratings on — whether or not you find it useful probably says as much about you as it does the software. The breakdown of which of your followers are “chatterboxes” and “deadbeats” is also unlikely to come as a surprise, but it can help you weed out the dead weight if you are using one of these services for professional reasons.ThinkUp Dashboard

On the whole, however, I was a little disappointed in the depth of the analysis tools provided with the 1.0 release. To me, which of my posts received the most attention this week could prove useful, but it is also fairly simple to figure out in the existing Twitter and Google Plus interfaces. I kept asking myself, where are the mash-ups? ThinkUp knows the popularity of my friends, and which of my posts gets re-shared the most — is it too much to ask that it put those facts together, and tell me which of my posts proved most popular with “important” users? That would be valuable.

Admittedly, the GeoEncoder plugin is supposed to do something along these lines, showing you a geographic distribution of responses. But I could never get the GeoEncoder plugin to work. Partly this may be because I tested the application on a Dreamhost shared server, which according to the wiki will cause timeouts. But I can’t entirely chalk the trouble up to that; my server did not experience the timeouts in data crawling that the wiki said I would — it was only the GeoEncoder that failed to function (and, naturally, that plugin provides no feedback or error log).

Sure, in open source, the answer to this type of concern is “you can fix it” — and there appears to be a healthy ecosystem of developers around ThinkUp, out of which presumably a decent number are interested in writing plugins for new network services (such as Status.Net) or for mashing up the existing ones. They are not particularly easy to find, though. You can browse the mailing list and get a fair idea of what people are working on, but I wasn’t able to find a list of third-party plugins on the wiki (for example). But there are things brewing within the company; mostly notably a “live tweet” stream plugin — with its addition, ThinkUp morphs more into a desktop-like Twitter exploration tool.

It’s a good glimpse at where the future might take the app. The company behind ThinkUp has a roadmap and clues to a redesign for the 2.0 revision, too. But no matter where it heads next, ThinkUp has one killer feature that I have not even mentioned yet: you can export all of your data, from every platform, in raw form suitable for analysis in a spreadsheet or external database.

That is probably the most powerful feature of the system — and one that the network services generally do not offer — albeit not the most useful. But if you can manipulate your data into a more useful form in a spreadsheet or another application, chances are you can write a plugin that ThinkUp can use to automate the process as well — and giving users a user-friendly way to explore their data is what ThinkUp is all about.