Spam that is directed against a wiki usually consists of lists of links inserted in standard pages, such as user profiles or sandbox areas. Although much of the spam consists of innocent-appearing link text, the actual URL points to a phishing, gambling, or porn site. By putting out hundreds and thousands of spam pages, wiki spammers increase the number of links available for people to click on and thereby bump their sites higher in the search engines.
A list of reported wiki spam incidents and a Google search on known wiki spam names show that this is a growing problem.
The wiki community has been grappling at length with spamming. The theory in the past has been that wikis are self-maintaining in the sense that community editing will remove or polish entries over time. That may not be true. To see why, consider that a wiki posting may go unnoticed for some time, especially if it's posted to a "ghost wiki" that is running more or less unattended. Even if it is noticed, because wikis retain revision histories by default, cleaning up the spam is more work than simply deleting an offending page. Complicating the problem even for popular wikis is the fact that legitimate users are less likely to participate in a given wiki if many of the postings are bogus and link to unappealing and dangerous sites, which means there will be fewer people to notice and help clean up.
Another factor that attracts spammers is that a wiki's registration and entry contribution process needs to be as streamlined and simple as possible. If it's not, legitimate users will become frustrated and go elsewhere. But the greatest vulnerability of a wiki is ironically its greatest strength. Wikis' openness and ease of modification allow the irresponsible and criminal to add bogus content and dilute the value of the shared resource.
There seems to be agreement that "wiki spam is a wikiwide problem that needs to be solved wikiwide." Furthermore, the wiki spam problem has to be solved by different approaches than email or blog comment spam; both of these are subject to bottlenecks in a way that wikis are not. There is also a community element that has to be addressed -- it's not just a case of putting in a Bayesian filter, or implementing something like greylisting, or setting up a blacklist. You have to consider the impact of use and misuse of a particular tool on a varied (very varied) community. For example, if you set up a blacklist as a wiki page and allow wiki members to submit blackhat IPs so as to take advantage of community support, how do you stop people from using this tool to take revenge on their enemies in the flame war du jour?
The best we can do right now is put a toolkit together and see what works. The current toolset consists of three main categories:
So, when you discover your wiki has been spammed, what should you do (after you remove the spam, of course)?
Naturally, there are a lot of wiki pages discussing the various evils of and responses to wikispam. My own page is here. Chongqued has a good list of resources and the Meatball wiki has a fairly good article with background and definitions.
Like everything else about the world of wikis, the discussion of spam fighting is both extensive and opinionated. You may have to do some digging to get your particular questions answered, but persevere. Despite the overabundance of rhetoric, there are numerous skilled individuals trying to protect their personal investment and community from misuse.
Luckily, most wiki spammers are fairly crude in their efforts and therefore easy to spot and block. However, that could quickly change as wiki spammers become more subtle and capable of getting around current impediments, such as blacklists, by using the same techniques email spammers do. When that occurs, we'll have to update our tools and create newer and more sophisticated ones.
Unfortunately, there is no clear victory in sight. The best we can hope for is to react quickly, utilizing our community support and rapid dissemination of information, to defeat each new round of wiki spammers.
Note: Comments are owned by the poster. We are not responsible for their content.
The various parasites that flourish in the web haven't even started to attack Wikis yet, probably because not enough millions of naive users are looking at Wikis yet. When they do, Wikis will go the way of Usenet.
I've been tracking email spam by ASN for some time. The basic theory is that there are some places from which abuse is far more likely to come than others.
I'm finding that the same rule applies to Wikis, as I also admin <a href="http://twiki.iwethey.org/" title="iwethey.org">TWikIWeThey</a iwethey.org>. In our case, it's AS4134 (China Telecom) which has been the overwhelming source of spam. The entire AS (you can get assignments from the <a href="http://www.cidr-report.org/" title="cidr-report.org">CIDR Report</a cidr-report.org>) is now null-routed at the server.
Looking over the spam reports at the Portland Pattern Repository, I'm finding a pretty familiar AS distribution, frequency and AS follow:
To map IP to AS, you can use the reverse DNS server at asn.routeviews.org, txt field. See the <a href="http://www.routeviews.org/" title="routeviews.org">Routeviews Project</a routeviews.org> homepage for more information.
rel=nofollow
Posted by: JelleB on June 28, 2005 11:39 PM#