Recent Topics

Trimming the antispam list - an experiment

Started by on Apr 04, 2006 – Contents updated: Apr 04, 2006

Apr 04, 2006 02:10    

Had me an idea the other day that I finally acted on. I set my settings table antispam_last_update field to 2001-01-01 00:00:00 (which is the default value for "never updated") then emptied my antispam table. I then asked for the update and of course got the first 1000 published keywords. I then did my great experiment: I emptied the antispam table again. I then asked for the update 4 more times to get the rest of the published keywords.

Thus my antispam table is 1000 entries shorter than anyone else who happens to have the full list as of this exact moment. I will be undoing the little bits I do with my .htaccess file so's my blog is (effectively) wide open to referer spammers. Those would be the bit that searches referer for partial matches and the bit that says my comment post form has to be refered from my own domain. The only thing I'll have intact is my simple turing test for commenters.

My point here is to see if any of these old spammers are still active. In truth I already know (or strongly suspect) that quite a lot of them were NEVER active and were published by aggressive antispam administrators who didn't foresee a future with thousands of keywords and more added daily.

Anyway I figure I'll let it run this way for the month of April. Anyone else wanna join me in this test? Give it a month before reporting anything and we'll have a really good idea of what keywords down in the belly of the beast don't need to be keywords anymore.

Apr 04, 2006 22:09

This is a good experiment, though I cannot help out really: I have a firewall, custom hacks et al.

But IMHO one day the Antispam blacklist should "just" become a Antispam plugin also, with some enhancements to the current behaviour (re-checking of existing entries for example), but especially counting the number of blocks that were caused by each entry. This could then even get reported back to the central list somehow.

Apr 06, 2006 09:43

I went one stage further than you 8|

At the beginning of March I totally emptied my blacklist, and instead started building my own custom list which also logs the date of last activity ( as long as it's been used after March 16th which was when I thought to add it :p).

I also keep a record of all of the spam comments in a copy of the normal comments table, so that I can analyse common trends.

To date I have 128 url's in my custom list (of which 44 are blogspot url's >:-< ) , and 10,280 comments in my evo_spam_comments table :lol:

If you want I can send you a link to a page which will show you all the url's (and a few other things), the count of the times they attempted to spam me with them, and the date they last attempted.

¥

Apr 07, 2006 16:46

blueyed wrote:

... But IMHO one day the Antispam blacklist should "just" become a Antispam plugin ...

That'd be great. Like throw hooks into the antispam tab so that people could pick and choose whatever antispam plugin they like, including the keyword list.

¥åßßå wrote:

I went one stage further than you 8|

At the beginning of March I totally emptied my blacklist, and instead started building my own custom list which also logs the date of last activity ...

Cool. It dawned on me that my little experiment was grossly flawed. I had no way of knowing if a spammer was new or something that got through because of the mass deletion. Duh... Thus I emptied the entire table and started over.

I've noticed something that someone smart might be able to take advantage of: referer hits to pages like

myblog.php?blog=1&cat=21&page=1&disp=posts&paged=2

means it's a spammer. People link to your blog root or a particular post - never to pages like that. At least not on my blog anyway. So I'm thinking a quick filter that says if there are 2 or more "&"s in the referered to page don't bother with it. Oh that's only good for those who use clean urls I guess, but since I do it works for me ;)

May 02, 2006 18:02

Time for a quick summary.

In total (since March) I've added 380 spam urls to my list (as opposed to the 4,400[ish] in the normal blacklist). 240 of these are blogspot urls.

Since I started keeping track of spam comments (about mid March) these 380 urls have stopped 26,000 spam.

For April (first complete month with stats), I had 19,200 spam which were stopped by 290 spam urls.

I've written it up in more detail on my blog, including a list of the 290 urls that spammed me in April.

From the limited stats that I have it would appear that the current blacklist could be severly prunned, but I'd need a lot more stats before I was sure.

It'll be interesting to see the results that you got.

¥

May 30, 2006 16:37

So what did you guys ever figure out? Should we dump our spam databases?

May 30, 2006 19:54

You can, but I wouldn't. In fact I undid my emptying and went a different path. I now block certain keywords with my .htaccess file, then remove matching terms from my local antispam table. I shaved quite a few hundred off the list, but also ended up blocking a real actual blog that linked to me. Anyway if you want to wipe out your local copy [url=http://forums.b2evolution.net/viewtopic.php?t=5394]this thread[/url] tells you how to do it. The first part - setting the settings table - will mean when you ask for the updated list you'll get everything from the beginning again. Use as you see fit eh? Another thing you can consider is hacking your installation with [url=http://forums.b2evolution.net/viewtopic.php?t=7912]this little gem[/url]. It won't do anything special about stopping spammers. All it does it make your antispam a bit friendlier for you.

BTW I'm an insider for the antispam central system. I get to publish keywords! I also got tired of an ever-growing list and came up with a method to shrink it by deleting published keywords that aren't blocking anything anymore. It's a long complicated thing but I'll try to summarize it. New reports have to come from a lot more reporters OR cover a lot of subdomains before we publish a keyword. That way we know it's an active spammer and not just somebody thinking it'd be cool to be a spammer. So the list grows with keywords that are likely to help rather than just grow. At the other end when a keyword is 2 years old we look at the database and see if the keyword is still being reported by anyone. If it didn't get a report in 2 years we delete it. We've removed probably 400 inactive keywords that way, and the program is only about a month old.

May 31, 2006 09:57

I agree with EdB on this one don't dump your antispam database unless you have other antispam measures .... fair warning huh?

The first time I binned all mine (after setting up a quick 'n' dirty comment moderation system) I ended up with a few thousand (moderated) comments for my troubles.

Right now I have something very similar to the blacklist running, the main difference is that it's only 10% the size of the central blacklist and isn't a yes/no answer (ie, logged in members can post some spammy stuff and it'll let it through).

The main idea for my experiment is seeing how many of the spam urls are actually still being used by spammers. I'll be posting my stats for May sometime over the next few days, and I'll do a quick summary here, but I can tell you now that the blacklist would have stopped over 1,000 spam a day if it was running on my blog.

¥

Jun 02, 2006 12:57

Quick summary for May :-

During May my blogs received a total of 33,000 spam which were stopped by just 47 urls ! This is an average of (approx) 700 spam comments for each url.

It makes me wonder just how many more we would have got if my server hadn't have had a few off days 8|

As usual I've written it up in more detail on my blog, including the list of the 47 urls that spammed us.

¥

Jul 05, 2006 12:13

Quick summary for June :-

During June my blogs received a total of 51,400 spam which were stopped by 102 urls.

One of their new tricks is to spam you with search engine results links, as nobody is going to blacklist Yahoo right?

As usual it's written up in more detail on my blog including the list of the 102 urls that spammed us (including the yahoo one so don't add that one to your own blacklists).

¥


Form is loading...

powered by b2evolution – This forum is powered by b2evolution CMS, a complete engine for your website.