In a recent article, published in October’s IEEE Security and Privacy issue, S. ABU-NIMEH and T. CHEN studied the so-called blog spam. Spam blog is the phenomenon to add spamming comments, totally irrelevant to the topic. There are several categories:
- Comment spam who try to corrupt the feedback of the community. Often done by trolls, they are not very problematic. This is the price of democracy and Web 2.0.
- Term spam add some words to be more relevant to search queries
- Link spam contains links to sites to increase the number of sites pointing towards the spamming site, thus increasing the famous page rank.
- splogs or spam blogs are fake blogs which sole purpose is to increase the pagerank of a given site.
The study showed that the practice is increasing. From more than one million collected comments, 75% were spams! They were issued by a limited number of emails address and IP addresses.
Studies try to build classifiers that attempt to detect blog spams. They are not yet accurate.
Meanwhile, there are a few lines of defense:
- Black lists of email and IP address
- Black list of words
End of 2010, I experienced this damned attack on this site. In one night, I could find more several tens of blog spams on one topic. It even reached 300 in one night. At the beginning, I indulged (you may still find some of them) and cleaned the mess. Then, it started to become worrying. The default installation of my blog provides a basic anti-spam test that is the answer to a simple arithmetic calculus. It seemed not deterrent enough. Then I started to black list some words such as codeine, Valium or hemoroid. This is not the usual vocabulary of security It slowed down the number of comments, but did not stop them. My last solution was to use CAPTCHA. CAPTCHAs are not user friendly, and may even rebuke some people to post comments. Nevertheless, it seems to have (temporarily?) stopped the spammers.
By the way, this issue of IEEE security and privacy has also an excellent paper from Teddy Furon and Gwenael Doerr about “tracing pirated contents on the Internet”