June 28, 2005

Stemming the menace of wiki spamming

Author: Rob Sutherland

Do you maintain a wiki that you haven't checked for a while? Better take a look at it. Over the past year or so, an increasing number of wikis and wiki variants have become the target of spam. Finding the right line to walk between complete vulnerability and complete inflexibility is the problem that wiki developers and operators are trying to solve. How can the wiki community keep wikis open and easy to use, yet stamp out the spam?

Spam that is directed against a wiki usually consists of lists of links inserted in standard pages, such as user profiles or sandbox areas. Although much of the spam consists of innocent-appearing link text, the actual URL points to a phishing, gambling, or porn site. By putting out hundreds and thousands of spam pages, wiki spammers increase the number of links available for people to click on and thereby bump their sites higher in the search engines.

A list of reported wiki spam incidents and a Google search on known wiki spam names show that this is a growing problem.

The wiki community has been grappling at length with spamming. The theory in the past has been that wikis are self-maintaining in the sense that community editing will remove or polish entries over time. That may not be true. To see why, consider that a wiki posting may go unnoticed for some time, especially if it's posted to a "ghost wiki" that is running more or less unattended. Even if it is noticed, because wikis retain revision histories by default, cleaning up the spam is more work than simply deleting an offending page. Complicating the problem even for popular wikis is the fact that legitimate users are less likely to participate in a given wiki if many of the postings are bogus and link to unappealing and dangerous sites, which means there will be fewer people to notice and help clean up.

Another factor that attracts spammers is that a wiki's registration and entry contribution process needs to be as streamlined and simple as possible. If it's not, legitimate users will become frustrated and go elsewhere. But the greatest vulnerability of a wiki is ironically its greatest strength. Wikis' openness and ease of modification allow the irresponsible and criminal to add bogus content and dilute the value of the shared resource.

There seems to be agreement that "wiki spam is a wikiwide problem that needs to be solved wikiwide." Furthermore, the wiki spam problem has to be solved by different approaches than email or blog comment spam; both of these are subject to bottlenecks in a way that wikis are not. There is also a community element that has to be addressed -- it's not just a case of putting in a Bayesian filter, or implementing something like greylisting, or setting up a blacklist. You have to consider the impact of use and misuse of a particular tool on a varied (very varied) community. For example, if you set up a blacklist as a wiki page and allow wiki members to submit blackhat IPs so as to take advantage of community support, how do you stop people from using this tool to take revenge on their enemies in the flame war du jour?

The best we can do right now is put a toolkit together and see what works. The current toolset consists of three main categories:

  • Troll control tools: Behavior-based banlists, blacklists, whitelists, and schemes to flag spamlike behavior, such as too many posts in too short of a time, and content analysis to pick out spamlike posts. There's some discussion of using Spamassasin to do this.
  • User verification: CAPTCHA -- Completely Automated Public Turing Test to Tell Computers and Humans Apart -- is usually a generated image of some text you have to type to proceed. This will stop automated spamming. At least if a real person is doing the spamming, the rate of spam can be reduced to what a person can type. And an email verification process combined with a blacklist of known spam domains will slow down human-generated spam.
  • After the fact: Checking for and removing spam pages, contributing information on spammers to the community, complaining to the abusers' ISPs, and publicizing their use of illegal and inappropriate methods. The Chongqued wiki employs an interesting technique to publicize spam-fighting techniques. It encourages wiki operations to link known spam keywords to the chongqed.org site so that over time, searches will point there rather than to the spam target site. Chongqued also maintains a list of sites that have had links put out in wiki spam.

So, when you discover your wiki has been spammed, what should you do (after you remove the spam, of course)?

  • Go to the support/development site for the wiki that you use and look for resources and discussion on the best practices for blocking wiki spamming. Last week a Twiki wiki I am responsible for was hit by a wiki spam bot that registered using a stolen ID and then added a number of links to spam and phishing sites based in China. I found a topic on wiki spam that led me to the BlackList plugin for Twiki. I installed it and added the offending IP address. I haven't seen any spam since then, but I'm sure the fun isn't over.
  • Google recommends the use of the "nofollow" attribute on hyperlinks in order to prevent spammers from gaining anything from comment spam. This technique is also effective for links put into wiki topics by spammers and is part of the Twiki Blacklist plugin.
  • If you've put up a ghost wiki that isn't being used, consider taking it down. If it's being used for evaluation, consider putting a.htaccess login on the main directory or using some other mechanism to prevent access by spammers.

Naturally, there are a lot of wiki pages discussing the various evils of and responses to wikispam. My own page is here. Chongqued has a good list of resources and the Meatball wiki has a fairly good article with background and definitions.

Like everything else about the world of wikis, the discussion of spam fighting is both extensive and opinionated. You may have to do some digging to get your particular questions answered, but persevere. Despite the overabundance of rhetoric, there are numerous skilled individuals trying to protect their personal investment and community from misuse.

Luckily, most wiki spammers are fairly crude in their efforts and therefore easy to spot and block. However, that could quickly change as wiki spammers become more subtle and capable of getting around current impediments, such as blacklists, by using the same techniques email spammers do. When that occurs, we'll have to update our tools and create newer and more sophisticated ones.

Unfortunately, there is no clear victory in sight. The best we can hope for is to react quickly, utilizing our community support and rapid dissemination of information, to defeat each new round of wiki spammers.

Category:

  • Enterprise Applications
Click Here!