Recent Topics

stk's dirty dozen ( fighting comment spam )

started by on May 11, 2006 – Last touched: May 11, 2006

May 11, 2006 04:31    

This was written as a response to solcist from [url=http://forums.b2evolution.net/viewtopic.php?t=7492]this post[/url], but then, I made it longer and thought others might benefit, so I've posted it separately.

With the central blacklist in a state of flux and a full 7 months since the last production release, there is a heightened concern regarding comment/trackback SPAM. The beta release should help alleviate these problems, but it's not out yet.

Me and a mate are conducting some tests regarding spam and when we're done, we'll be posting information [url=http://astonishme.co.uk]HERE[/url]. We're encouraged by what we've seen and been able to accompish, but (as yet) are unsure how it's all going to come together, be deployed, integrated, etc.

In the meantime, here are some methods that I'm aware of and a guesstimate about their effectiveness, based on some of our work and the work of other b2evolution developers and moderators.

1) htaccess (not allow offsite comments) - Easy to do, good in theory, but only partially effective (spammers often spoof you as the referrer or don't have a referrer, which defeats the method). [url=http://randsco.com/index.php/2005/11/18/anti_spam_script]Instructions (#2)[/url]

2) htaccess (IP blocking) - Easy to do, but reactionary. Spammers often switch IP's, so you'll soon end up plugging holes AFTER the fact and ending up with a long list of obsolute IP's. Good if your hits are coming from the same place, however. [url=http://forums.b2evolution.net/viewtopic.php?t=4427]INFO[/url]

3) Renaming htsrv folder - (manually or [url=http://randsco.com/index.php/2005/11/18/anti_spam_script]automatically[/url]). Not effective. Spammers are now parsing for file names contained within the folder, so find the folder regardless of name.

4) Renaming comment script - Seems to be the best of the "cheap" anti-spam methods. Haven't seen a script yet to do this automatically, but that's not a bad idea. [url=http://randsco.com/index.php/2005/11/18/anti_spam_script]Instructions (#3)[/url]

5) EdB's Turing Test Hack - Simple and effective. Ask an idiot proof question (i.e. - what color is purple?) as a means of defeating automatic SPAMMING. Does require visitors to jump over a hurdle, but it's the size of a rabbit. [url=http://forums.b2evolution.net/viewtopic.php?t=7471]HACK[/url]

6) CAPTCHA - Same concept, but an image is shown (with letters, numbers, words or some combination) and visitors must type in what they see. I've heard it's been defeated and sometimes the images can be difficult to read or "one's" and "els", "zeros" and "Ohs" are confusing. It's a larger hurdle than a rabbit. [url=http://forums.b2evolution.net/viewtopic.php?t=2976]HACK[/url] or [url=http://manual.b2evolution.net/CreatingAntispamPlugin]v1.8 plugin[/url]

7) Restrict comments to members - Fine for blog communities. Some concessions may be allowed for visitors to comment, but by and large, restrictive to anonymous visitors. [url=http://forums.b2evolution.net/viewtopic.php?t=5343]HACK info[/url]

8) *NEW* HTTPS for htsrv - Blueyed (a b2evo developer) suggests setting the $htsrv_url variable to an absolute HTTPS URL in [url=http://forums.b2evolution.net/viewtopic.php?t=7757]this thread[/url] ... untested by me, utilization depends on server settings (may not work on your host).

9) b2evo blacklist - Good sharing system, long list of obsolete URLs, often an after-the-fact method. (I don't know what the current status is, as I've heard that the list is [url=http://b2evolution.net/news/2006/05/02/the_centralized_antispam_backlist]looking for a new host[/url] and we currently don't use it on our site).

10) Comment Moderation - In development. Allows you to approve comments before they're listed to the blog. Kind of a pain for legit comments as there is a lag time between a visitor posting and you having a chance to approve it. Can be time intensive if you get lots of comments.

11) SPAM Karma - [url=http://manual.b2evolution.net/CreatingAntispamPlugin]In development[/url]. The idea is to assign weighting factors to various comment attributes (IP address, author, keywords, URL and content). Comment is then assigned a value, based on a weighted average (called it's "karma"). Values below a certain amount are automatically posted to the blog, values above a certain amount are blocked as SPAM and inbetween values are set aside for moderation. (At least, that's how I *think* it's going to work, as I'm not a developer.)

12) ¥åßßå's Spam Hound - (In development) This is my mate's prototype SPAM system, which is still in it's nacency. However, his hound has munched over 26,000 spam comments and less than a dozen have even made it to moderation. (Read: There is HOPE!) Our plan is to turn the tables on spammers. We know we've been successful when it's the SPAMMER's that are using a blacklist and it's FILLED with b2evolution blogs urls. ;) [url=http://www.innervisions.org.uk/babbles/index.php?p=583&more=1&c=1&tb=1&pb=1]Read about the Hound[/url]

If you know of any other methods, please pipe up and provide a general description and link. Thanks.

Hope this helps.

PS - Regarding trackback | pingback | referrer spam. My recommendation is to simply turn off trackbacks and pingbacks. (Seldom used features, by and large.) For referrer spam, the best method is to NOT show stats on your site (and delete the stats.php file). Bing-batta-boom ... problem solved.

May 13, 2006 05:52

I woke up today I decided I needed to work on spam in my blog.

Thanks for this helpful list!

Jun 20, 2006 11:47

thank you for this nice overview!
it's exactly what I needed!

Jun 24, 2006 12:27

I've had to do a few hacks to my site to cut down on the number of spam comments - I've had 25,000 attempts this month and only a handful have got through, and they're only the 'beautiful site, nice design' ones that seem to be manually entered.

- check for duplicate email address or comment contents, 2 minutes and 24 hours respectively, and block the comment if necessary.

- removed the URL field from the form, and block any comments that are submitted with a URL. This doesn't seem to have reduced the number of genuine comments, and if someone doesn't want to post because they can't advertise their own site it's effectively spam anyway. I've get a special pass-phrase, and if the author name field contains the phrase it sets the name and URL to mine so people can identify where I've responded to comments.

- block any comments containing URLs by searching for http, www and a few top level domains (.com, .pl etc). This may not suit your site, but I've not had any problems, and I make it clear on the form that they're not allowed.

- maintain an array of keywords used in spam comments, e.g. phentermine, viagra etc, and block any comments that contain them.

All this means I don't have to constantly keep an eye on it, and users don't have to jump through hoops to leave a comment. And if any of the 'nice site, good content' comments slip through people will just think others are impressed by my site :)

I haven't posted any of the code because I had so many hacks in place before doing all this I'd not been able to upgrade to anything like the most current version, so I don't know if it would still fit (and some may already be in the latest versions). They're mostly in comment_post.php though and it should be fairly easy to work out what to add, and where.

Jun 25, 2006 22:09

s7uar7 wrote:

I've had to do a few hacks to my site to cut down on the number of spam comments - I've had 25,000 attempts this month and only a handful have got through, and they're only the 'beautiful site, nice design' ones that seem to be manually entered.

well that's pretty nice success ...

- check for duplicate email address or comment contents, 2 minutes and 24 hours respectively, and block the comment if necessary.

by hand or how? I'm trying to host blogs for customers so there should be a way to get this done automated ...

- removed the URL field from the form, and block any comments that are submitted with a URL. This doesn't seem to have reduced the number of genuine comments, and if someone doesn't want to post because they can't advertise their own site it's effectively spam anyway.

right. people use that field to get more incoming links to their website for a better pr and stuff ...

I've get a special pass-phrase, and if the author name field contains the phrase it sets the name and URL to mine so people can identify where I've responded to comments.

nice ... where is your blog located? I would like to have a look ...

- block any comments containing URLs by searching for http, www and a few top level domains (.com, .pl etc). This may not suit your site, but I've not had any problems, and I make it clear on the form that they're not allowed.

good idea ...

- maintain an array of keywords used in spam comments, e.g. phentermine, viagra etc, and block any comments that contain them.

I can't do this by myself every day there must be an automated way somehow - the central anti-spam list is a beginning, but I still have to click 'update' manually - why doesn't it do it every 24hrs automatically via cron or somethng?

All this means I don't have to constantly keep an eye on it, and users don't have to jump through hoops to leave a comment. And if any of the 'nice site, good content' comments slip through people will just think others are impressed by my site :)

yeah, right :D ...

I haven't posted any of the code because I had so many hacks in place before doing all this I'd not been able to upgrade to anything like the most current version, so I don't know if it would still fit (and some may already be in the latest versions). They're mostly in comment_post.php though and it should be fairly easy to work out what to add, and where.

okay, I will - thanks a lot for your these posts here!


Form is loading...

blogtool – This forum is powered by b2evolution CMS, a complete engine for your website.