Recent Topics

Antispam Bandwidth, revisited

Started by on Aug 13, 2005 – Contents updated: Aug 13, 2005

Aug 13, 2005 01:46    

[url=http://forums.b2evolution.net/viewtopic.php?p=21359]This Antispam Bandwidth thread[/url] has some really great information in it, but has grown very long. Here I try to capture what I consider the gems from that thread. By that I mean stuff that goes into your b2evolution installation - not your .htaccess file.

The whole idea is to cut the bandwidth that spammers are using even though they are part of your antispam table. Normally what happens is they get a page delivered that is supposed to get them in your hitlog. The very last thing the page load does is add the hit to your hitlog, at which point the antispam table will reject the entry. They got your bandwidth, but not credit for being a 'referer'. The first hack in this thread checks the referer immediately, and if they are in your antispam table they get shut down with a 403 error. In other words, normally they eat a page before being denied but with this they immediately get told the page they are looking for is gone.

Put the following in your conf/hacks.php file. If you don't have one of those don't worry - just create it and b2evolution will read it.

Code

<?php
/**
* Bounce all referrers who are blacklisted
* Isaac Z. Schlueter
**/
if( !empty($_SERVER['HTTP_REFERER']) && strpos($_SERVER['HTTP_REFERER'],$baseurl) !== 0 )
{
  $is_a_spammer = $DB->get_row("select aspm_ID, aspm_string from  $tableantispam
      where '" . $_SERVER['HTTP_REFERER'] . "' like concat('%',aspm_string,'%')");
  if( $is_a_spammer ) {
    header('HTTP/1.0 403 Forbidden');
 
    // un-comment the next line of code to redirect back to the referring
    // page. I didn't do this, in the event that perhaps there is a false
    // positive, but you needn't be so kind.
    // In any event, the bandwidth is teensy either way.
    // header('Location: ' . $_SERVER['HTTP_REFERER']);
    ?>
    <html><head><title>Stop Referrer Spam!</title>
    </head><body>
    <p>You are being denied access to this page because you have been referred here by a
      known spammer: [<?php echo $_SERVER['HTTP_REFERER'] ?>].</p>
    <p>If you have reached this page in error, feel free to
      <a href="<?php echo $baseurl . $ReqURL ?>">bypass this message</a> with our
      apologies. Please leave a comment telling us to stop
      blacklisting sites matching [<?php
        echo $is_a_spammer->aspm_string
      ?>] so that this
      doesn't happen again.</p>
    <p>Thank you, and sorry for the inconvenience.</p>
    <p>If, on the other hand, you are a bandwidth-eating referrer spam robot,
      then we hope that your owner dies a painful death and rots in hell,
      and that his or her seed is scrubbed from the face of the earth.</p>
    <p style="text-align:center">--The Management</p>
    </body></html>
    <?php
    die();
  }
}
?>

By the way, if you don't keep your antispam table up to date it doesn't help much. You NEED to keep that table up to date if you really want to keep them off your blog. Unfortunately that table is HUGE, and sometimes duplicates itself. Sort of. The same entry doesn't show up twice, but sometimes one entry would also ban referer traffic from a different entry. This brings me to the second groovy part of that thread: Pruning your antispam table. Again we see great code from Isaac. Save the following in your admin folder as b2antispam_prune.php and run each time you get new entries in your antispam table.

Code

<?php
/**
* Blacklist Pruner for b2evolution
* by Isaac Z. Schlueter
* http://isaacschlueter.com/category/antispam
**/
 
// require the required things to make this all fly right so it can access your db and whatnot
require_once( dirname(__FILE__) . '/../conf/_config.php' );
$login_required = false;
require_once( dirname(__FILE__) . "/$admin_dirout/$core_subdir/_main.php" );
require_once (dirname(__FILE__).'/'.$admin_dirout.'/'.$core_subdir.'/_functions_antispam.php');
 
$output = '';
 
if(!isset($show_html))$show_html = false;
 
// get the list of all antispam strings sorted by length.
$DB->query('SELECT aspm_id,aspm_string FROM T_antispam order by length(aspm_string) ASC');
 
// this is copying and not referencing on purpose - we need to save this list as it is now.
$aspm = $DB->last_result;
// uncomment to see what it returns for debugging.  This is big and messy, though!
//$output .= '<pre>'."\n";
//$output .= print_r($aspm, true)."\n";
//$output .= '</pre>'."\n";
$totalaffected = 0;
foreach($aspm as $k => $v)
{
  if( !$DB->get_row('SELECT aspm_id FROM T_antispam WHERE aspm_ID = ' . $v->aspm_id) )
    continue; // already deleted this one.  go to the next one.
 
  $v->aspm_string = str_replace( array('%','_'), array('\\%','\\_'), $v->aspm_string);
  if( $v->aspm_string ) {
    $DB->query('DELETE from T_antispam WHERE aspm_ID <> ' .
                $v->aspm_id .
                " AND aspm_string LIKE '%" .
                $v->aspm_string . "%'");
  
    $affected = $DB->rows_affected;
    
    if($affected) {
      $output .= '<p>Deleted ' . $affected . ' that matched ' . $v->aspm_string . '</p>'."\n";
      $totalaffected+=$affected;
    }
  }
}
$output .= '<p>Pruned a total of ' . $totalaffected . ' rows from the blacklist.</p>'."\n";
if(!$show_html)$output = strip_tags($output);
echo $output;
?>

After uploading it to your server go ahead and run the program by typing "your_baseurl/admin/b2antispam_prune.php" into your browser's address bar. It'll take a little while to run and eventually tell you all sorts of stuff about how many 'duplicate' entries it removed. I had a stock installation with all the updates to antispam, and this tool reduced it by around 35%. That saves another tiny piece of bandwidth for you, but mostly it's just cleaning up the table.

Again: keep your antispam table up to date!

When you go to the admin area and hit the antispam tab are you tired of how long the antispam table is? Would you like to pull all the spammers out that are already in there but don't want to hit 'recheck' for each of them? Then use my [url=http://wonderwinds.com/hackblog.php/2005/02/07/antispam_recheck_tool_part3]antispam rechecker[/url] hack. It affects core files, so upgrading means losing the hack, but for now it will greatly simplify your life (with regard to blog spammers that is).

Are you tired of me saying you should keep your antispam table up to date? Would you like a simplified way of updating it? Another file from Isaac does exactly that. Isaac wrote a file that automatically checks for new updates, then wrote about it [url=http://isaacschlueter.com/blog/work/programming/automatic_antispam_update_cron/]on his blog[/url]. Since running a cron job exists outside your b2evolution installation I'm not going to duplicate it here. If you used my antispam hack and you don't mind running a cron job, then you might be interested in Isaac's [url=http://isaacschlueter.com/blog/hobbies-and-projects/b2evolution/b2evolution_antispam_recheck_cronjob/]automatic rechecker[/url]. By the way if you use Isaac's automatic rechecker be careful of where he uses 'no' and 'yes' for the 'rechecked' column. I changed the hack he refers to because it fails on some servers when using those terms. (no became needs and yes became gotit.)

[url=http://isaacschlueter.com/category/antispam]Visit Isaac's antispam category[/url] so he doesn't feel lonely after switching to a uniblog app.

dotted line - dotted line - dotted line - dotted line - dotted line - dotted line - dotted line - dotted line - dotted line - dotted line - dotted line

Please don't respond to this thread with any questions about .htaccess or cron jobs or *anything* that isn't about files in your b2evolution installation. I know nothing about them, I don't do anything with them, and those aren't the reasons these forums exist.[color=red][/color]

Aug 24, 2005 03:58

Hi:

Why, if I call the hacks.php file do I get this error:
Fatal error: Call to a member function on a non-object in /home/******/public_html/conf/hacks.php on line 8

Aug 24, 2005 04:49

That sort of depends on what's on line 8 of your conf/hacks.php file :D

Have you installed any other hacks? Are you running the latest version of b2evolution? Did you have the conf/hacks file before doing this hack? I'm just trying to learn what your line 8 is and why this hack is choking there.

One problem that I noticed in the hack I re-posted is that it assumes you either don't have a conf/hacks file or know how to tweak it nicely. Where this thread starts the conf/hacks file with a "<?php" is what I mean. If your hacks file already had it then you don't need it again. If on the other hand you never had a conf/hacks.php file then the code I copied from Isaac is exactly what you should use. It *should* work...

Other hacks and possibly older versions *might* be the reason you are getting an error.

If you're not running 0.9.0.12 (or at least 0.9.0.10 or 0.9.0.11 with both security patches installed) you should (a) upgrade and (b) not assume new hacks will work for you. If you have the latest version and installed the hack correctly (which is pretty simple eh?) then we have a problem and I would be happy to help you fix it.

Okay: what version are you running, what is in your hacks.php file from like line 1 to maybe line 12, and what other hacks do you have installed?

Aug 24, 2005 05:09

I'm using the latest version with security patches installed.
I've copied the code you posted and saved it as hacks.php into the conf/ folder. There wasn't any hacks.php file there before.
Line 8, I think, is this one:
$is_a_spammer = $DB->get_row("select aspm_ID, aspm_string from $tableantispam

Your instructions are very simple and I think I've followed them right but I get this error.

Aug 24, 2005 05:32

Okay that helps. The issue might be due to the editor you used to create the conf/hacks.php file. The line you show as line 8 should be much longer than what you copied here. It should be all this stuff on ONE line:

Code

$is_a_spammer = $DB->get_row("select aspm_ID, aspm_string from  $tableantispam where '" . $_SERVER['HTTP_REFERER'] . "' like concat('%',aspm_string,'%')");

Problem is phpbb will wrap the text to fit nicely on the screen, and, depending on which editor you use (and what settings it has) you might see that line broken up as line 8 and 9. That'd be b-a-d!

Just to show that line as one line in this forum, I'm going to try something:

Code

$foo = $DB->foo("select foo, bar from  $foo where '" . $_FOO['BAR'] . "' like foo('%',foobar,'%')");

Technically speaking that should be "fubar" but hey - foobar works! Well, I hope it is short enough to fit on your screen at your resolution... If nothing else it will show you where the spaces belong in the real line 8.

Good old notepad or fancy-schmancy wordpad work well for hacking php files. Of those two I sort of like wordpad better because it won't automatically do a word-wrap, but I don't use either. Something like MSWord is cool for documents you might print someday, but crap for code because it *assumes* you want everything to fit on a single sheet of paper. Some people use Dreamweaver to edit PHP files, others use FrontPage. My choice is [url=http://www.chami.com/html-kit/]HTML-Kit[/url]. It has a free version is why I like it. If you download HTML-Kit it will NOT do wordwrapping and, if it has to, it will show you that it is word-wrapping. It shows you each line number, which really helps troubleshooting issues like yours. If I ever get a job again I think I'll buy the full version because it's a wicked-cool editor.

Anyway, consider your editor and consider what line 8 should be. Open your conf/hacks.php file in some different editors (ideally something *very* simple, or, get HTML-Kit) and see if you can make line 8 be all the stuff I copied here.

Aug 24, 2005 06:37

Ok, I used TextPad 4.7.3 which until now behaved quite well but I will follow your recomendation.
Now, I corrected the line and now when I run the hacks.php I get a blank page, is that the correct behaviour?

Thanks for your help

Aug 24, 2005 06:54

Remove the hacks.php file and see if it restores your blog. Assuming your blog loads then there must be something wrong in your hacks.php file. We can then work on exactly what's wrong. If your blog still fails to load something else is wrong, but I'm 99% sure it's the hacks file.

I've seen that before: no error, but no page. It has always been because I did something wrong with my hacks file. The only way to fix it is to figure out exactly what the issue is.

One thing to look at, especially since your editor was an issue with line breaks, is "do you have any white space at the end of your hacks file?". White space refers to *anything* after the last "?>" including a return making a next line. The final character needs to be the ">". Anyway if getting rid of hacks will probably restore your blog. Then checking white space will maybe fix your hacks file.

Oh plus also make sure everything else 'looks right' in the hacks file. If your editor messed up one line it may have messed up others as well.

Aug 24, 2005 07:07

Probably.

There is no reason to actually run that file. b2evolution will run it every time your blog is requested by a visitor. If the visitor happens to be a spam referer then they won't get your blog. Instead they'll get a "403" error message. Everyone else will get your blog.

Aug 24, 2005 07:19

How can I be sure the script is doing what it is suppossed to do?
If you can answer this I will appreciate it and then no longer bother you since you have helped me well and enough

Best Regards

Aug 24, 2005 07:55

Hmmm...

With regard to the bit that went in the hacks file: You can't. Not right away anyway. What that hack does is subtle. First understand that a spammer who is pushing a domain name that is banned in your antispam table gets the whole page delivered in order to trigger the log_hit() function. At that point the hit is deemed a spammer and is not added to your hitlog table. This hack does it differently. As soon as a page is called by every visitor (including spammers) the hacks file is read. If the spammer is pushing a domain name that is banned in your antispam table they immediately get rejected AND get a 403 error. Most of them don't care about the 403 - they don't bother to clean up their lists, but you gain anyway. You gain by the spammer not taking all the server resources required to deliver the entire page before they are rejected.

Can you see your domain traffic data through your web host? My host gives me cpanel, which gives me (among other things) aw-stats. I like it. Anyway one of the things it shows me is an overall view of how much bandwidth by month my domain is using. In June 2005 my web ate up 3.46 gigs of traffic. In early July I installed Isaac's "instant-403 hack", and my bandwidth consumption dropped dramatically. 1.66 gigs for June 2005. It's back up now because I do way to much FTPing of stuff, but I can see that *I* am the cause of the traffic. Another proof positive that this hack is working is by looking at the amount of bandwidth for error 403 in each month. In June it was ZERO and in July it was 12.33 MEGS. That tells me that all other considerations aside, I traded 1.8 GIGS of spammers getting to my log_hit() function for 12.33 MEGS of giving them 403s instead.

The auto-pruner hack is different. As it says, you run it from time to time. Type the url into your address bar, and you will certainly see that it does something. It reduces the overall number of entries in your antispam table, which just makes the process of reviewing it a wee bit quicker for both spammer 403s and all visitors when they get to the log_hit() function.

Aug 24, 2005 08:29

I'll be watching for the bandwidth changes from now, its not that my site gets too much visits a month, it gets from 1.5 GB to 2 in Bandwidth. I hope with this new script to see if my visits are human or the: bandwidth-eating referrer spam robot, then we hope that your owner dies a painful death and rots in hell, and that his or her seed is scrubbed from the face of the earth.

Best Regards, your helps has been very professionnal

Bye

Oct 11, 2005 05:48

Might be a silly question, but I'm asking anyway... the antispam pruner can be set up to run from cron-job, say, once daily, yes...?

If above=yes
then can it be tagged onto the end of the antispam recheck so it all gets handled at once? I run my antispam recheck (excellent hack, by the way) once every 3 hours, which might be excessive, but it's been working ok for me so far. Would running the pruner be too cpu intensive to be running that frequently, and aside from cpu usage, is it maybe pointless to run it that often anyway?

This is my current b2antispam_poll.php file, run from cron every 3 hours:

PHP

<?php
require_oncedirname(__FILE__) . '/../conf/_config.php' );
$login_required false;
require_oncedirname(__FILE__) . "/$admin_dirout/$core_subdir/_main.php" );
require_once (dirname(__FILE__).'/'.$admin_dirout.'/'.$core_subdir.'/_functions_antispam.php');
b2evonet_poll_abuse();
?>

jj.

Oct 11, 2005 06:52

I'd say running the pruner more than once a MONTH is probably excessive, unless you tend to ban lots of things yourself and suspect that new keywords handled them in a more efficient manner.

The biggest gain of the pruner is accomplished the very first time you run it, and for new blogs (or people who completely empty their antispam table) you won't get much of a bang out of it anymore. I removed over 400 keywords from the table based on the pruner, and since have seen cause to remove only 2 more.

Oct 11, 2005 07:09

Okay, yeah, I was wondering about that aspect as well. When I ran it on a test blog, it didn't prune anything at all.. I guess I can set it to run monthy via cron.

jj.

Dec 21, 2005 12:54

This is my story:
I installed b2evolution in my culture related well known portal. I had five blogs working. About a week after installing, it started getting web spam.
First of all, I used the tools provided in the control panel, then I installed plugins. Nothing seemed to work. One month later, my hosting provider told me to go, because I was using about 10 Gb bandwidth a day. I deleted b2evolution and its directory and bought a dedicated server. Spam keeps coming.

I have denied access to that directory from the htconf, but they keep trying to get the inexistent pages:

<Directory /directory/here>
Order Deny,Allow
Deny from All
</Directory>

When I read the error log, I find that every second there is about 20 attemps for that directory. Even that now my bandwidth is controlled, does anyone know is this can collapse my server if it keeps growing?. I am worried about the size of the error log. Does anyone know if it is possible to block an ip referer from the DNS server?.

Jorgr

Dec 21, 2005 16:59

your upstream provider can put an ip block in place, yes, however its typically done only in the most extreme of circumstances.

Its easier to handle those things with iptables/ipchains in LINUX, and/or whatever the Win2K server equiv is, though, if you cant manage it at the application level.

Feb 02, 2006 12:20

I installed the hack and it worked great - nice to see those bandwidth graphs looking much more normal!

The only problem I found was that the hack was causing MySQL to work very hard - only running version 3 so no query caching available. To solve the problem I modified the hack to do a simple caching of banned URLs.

Posting it here just in case it helps someone with a similar problem...

The cache works by creating a zero length file for each URL it bans and checking against this list before querying the database.

Create a directory called banned alongside your conf directory that is writable by the webserver and use this code:

Code

<?php
/**
* Bounce all referrers who are blacklisted
* Isaac Z. Schlueter
**/
if( !empty($_SERVER['HTTP_REFERER']) && strpos($_SERVER['HTTP_REFERER'],$baseurl) !== 0 )
{
  $filecheck = dirname(__FILE__) . "/../banned/".str_replace("/","_", $_SERVER['HTTP_REFERER']);
  if (file_exists($filecheck))  {
    $is_a_spammer = 1;
  }
  else {  
  $is_a_spammer = $DB->get_row("select aspm_ID, aspm_string from  $tableantispam
      where '" . $_SERVER['HTTP_REFERER'] . "' like concat('%',aspm_string,'%')");
  }
  
  if( $is_a_spammer ) {
    header('HTTP/1.0 403 Forbidden');
 
    // un-comment the next line of code to redirect back to the referring
    // page. I didn't do this, in the event that perhaps there is a false
    // positive, but you needn't be so kind.
    // In any event, the bandwidth is teensy either way.
    // header('Location: ' . $_SERVER['HTTP_REFERER']);
    ?>
    <html><head><title>Stop Referrer Spam!</title>
    </head><body>
    <p>You are being denied access to this page because you have been referred here by a
      known spammer: [<?php echo $_SERVER['HTTP_REFERER'] ?>].</p>
    <p>If you have reached this page in error, feel free to
      <a href="<?php echo $baseurl . $ReqURL ?>">bypass this message</a> with our
      apologies. <p>Thank you, and sorry for the inconvenience.</p>
 
    <p style="text-align:center">--The Management</p>
    </body></html>
    <?php
    
    touch($filecheck);
    
    die();
  }
}
?>

As a side effect you can see which spammers are active by listing the contents of the banned directory in date order...

Jon

Mar 20, 2007 17:10

Hallo EdB!

Does the hack.php and b2antispam_prune.php work with 1.9.3?

nureac


Form is loading...

multiblog engine – This forum is powered by b2evolution CMS, a complete engine for your website.