Recent Topics

1 Oct 30, 2006 18:21    

We had a major problem with spam (350K comments) and after the server went down under the weight of it all this weekend, we decided to get serious with the cleanup. We knocked out 250K by targeted SQL queries, and then went looking for a cleaner solution.

While wandering around the forums I saw http://forums.b2evolution.net/viewtopic.php?t=7577 EdB's experiment with deleting the entire anti-spam table. I tried it on our dev server - not deleting and leaving gone, but simply deleting and re-importing, to try to force a re-check of all the banned domains. They all imported correctly, but nothing was deleted. However, I then manually re-checked the word 'zoloft' and it found a comment to delete.

There's a whole list of expectations I had when I deleted our anti-spam entries, but that didn't meet any of them. Is this a bug? I would expect a full re-import would lead to a full re-check of our comments and trackbacks. Is this not the case?

Thanks,
David

2 Oct 30, 2006 20:16

When new keywords get added to the antispam blacklist through updating it, they don't get re-checked..

That would be a new feature, but the problem here is, that it would have to re-check in chunks, because the server would probably timeout if re-checking the whole antispam list against a lot of hitlog entries and comments etc.

Patches are welcome.. :)

3 Oct 30, 2006 21:05

Yeah - I had wondered about the whole server load thing - I thought that was why the blacklist only did 1K at a time.

So does anyone have a suggestion (and I'm sure I'm repeating an oft-asked question) for purging comments? Maybe a script floating around that you ran once and forgot about?

Failing that, how about this - we run mysql, so I'm thinking a simple procedure with a cursor, something along the lines of
( adapted from http://forge.mysql.com/wiki/Cursors )


drop procedure if exists cursorproc
//
create procedure cursorproc()
begin

   declare delete_me varchar(80);
   declare l_loop_end INT default 0;

   declare cur_1 cursor for select aspm_string from evo_antispam;
   declare continue handler for sqlstate ´02000´ set l_loop_end = 1;

   open cur_1;

   repeat

      fetch cur_1 into delete_me;
      delete from evo_comments where comment_content like '%delete_me%';


   until l_loop_end end repeat;

   close cur_1;

end;
//

Please note that I made that up - does anyone know whether something along these lines (but syntactically correct, for example) would work?

4 Oct 30, 2006 21:17

I'm not into MySQL procedures, but it looks like it should work.

btw: the antispam list hast nothing to do with the antispam plugin, so I'll change the subject.

5 Oct 31, 2006 00:15

What version of b2evolution are you using? If you're still back in the 0.9.2 generation you should upgrade AFTER you apply the hack I made that does check and clean your hitlog. Not when you get an update mind you, but when you tell it to. It will check every keyword ever imported.

Haven't done it for 1.8.2 yet. Really should be a plugin, would be okay as a hack, but it's neither yet :(

6 Oct 31, 2006 15:45

EdB wrote:

What version of b2evolution are you using?

We're at 1.8.2 - I didn't keep a copy of the old version except as a tar file, and I didn't figure the cleaning would matter.

Given that we've now implemented the captcha module, I'm not sure we need to keep any entries in the blacklist, honestly. On the other hand, given that we had 350K entries on our comment table, I don't think 4K entries on our antispam table will kill the server!

BTW - through a series of carefully chosen keyword deletes, we were able to knock out 340K of those entries in a morning.

David


Form is loading...