Linux.com

Everything Linux and Open Source

Stemming the menace of wiki spamming

June 28, 2005 (8:00:00 AM)  -  4 years, 5 months ago

By: Rob Sutherland

Do you maintain a wiki that you haven't checked for a while? Better take a look at it. Over the past year or so, an increasing number of wikis and wiki variants have become the target of spam. Finding the right line to walk between complete vulnerability and complete inflexibility is the problem that wiki developers and operators are trying to solve. How can the wiki community keep wikis open and easy to use, yet stamp out the spam?

Spam that is directed against a wiki usually consists of lists of links inserted in standard pages, such as user profiles or sandbox areas. Although much of the spam consists of innocent-appearing link text, the actual URL points to a phishing, gambling, or porn site. By putting out hundreds and thousands of spam pages, wiki spammers increase the number of links available for people to click on and thereby bump their sites higher in the search engines.

A list of reported wiki spam incidents and a Google search on known wiki spam names show that this is a growing problem.

The wiki community has been grappling at length with spamming. The theory in the past has been that wikis are self-maintaining in the sense that community editing will remove or polish entries over time. That may not be true. To see why, consider that a wiki posting may go unnoticed for some time, especially if it's posted to a "ghost wiki" that is running more or less unattended. Even if it is noticed, because wikis retain revision histories by default, cleaning up the spam is more work than simply deleting an offending page. Complicating the problem even for popular wikis is the fact that legitimate users are less likely to participate in a given wiki if many of the postings are bogus and link to unappealing and dangerous sites, which means there will be fewer people to notice and help clean up.

Another factor that attracts spammers is that a wiki's registration and entry contribution process needs to be as streamlined and simple as possible. If it's not, legitimate users will become frustrated and go elsewhere. But the greatest vulnerability of a wiki is ironically its greatest strength. Wikis' openness and ease of modification allow the irresponsible and criminal to add bogus content and dilute the value of the shared resource.

There seems to be agreement that "wiki spam is a wikiwide problem that needs to be solved wikiwide." Furthermore, the wiki spam problem has to be solved by different approaches than email or blog comment spam; both of these are subject to bottlenecks in a way that wikis are not. There is also a community element that has to be addressed -- it's not just a case of putting in a Bayesian filter, or implementing something like greylisting, or setting up a blacklist. You have to consider the impact of use and misuse of a particular tool on a varied (very varied) community. For example, if you set up a blacklist as a wiki page and allow wiki members to submit blackhat IPs so as to take advantage of community support, how do you stop people from using this tool to take revenge on their enemies in the flame war du jour?

The best we can do right now is put a toolkit together and see what works. The current toolset consists of three main categories:

So, when you discover your wiki has been spammed, what should you do (after you remove the spam, of course)?

Naturally, there are a lot of wiki pages discussing the various evils of and responses to wikispam. My own page is here. Chongqued has a good list of resources and the Meatball wiki has a fairly good article with background and definitions.

Like everything else about the world of wikis, the discussion of spam fighting is both extensive and opinionated. You may have to do some digging to get your particular questions answered, but persevere. Despite the overabundance of rhetoric, there are numerous skilled individuals trying to protect their personal investment and community from misuse.

Luckily, most wiki spammers are fairly crude in their efforts and therefore easy to spot and block. However, that could quickly change as wiki spammers become more subtle and capable of getting around current impediments, such as blacklists, by using the same techniques email spammers do. When that occurs, we'll have to update our tools and create newer and more sophisticated ones.

Unfortunately, there is no clear victory in sight. The best we can hope for is to react quickly, utilizing our community support and rapid dissemination of information, to defeat each new round of wiki spammers.

Read in the original layout at: http://www.linux.com/archive/articles/45848