Antispam Bandwidth
Goto page 1, 2, 3, 4, 5  Next
 
Post new topic   Reply to topic   printer-friendly view    b2evolution Forum Index -> Plugins & Extensions -> Fighting spam!
View previous topic :: View next topic  
Author Message
isaac
Hooked :)
Hooked :)

Joined: 03 Dec 2003
Posts: 428
Reputation: 46.9Reputation: 46.9Reputation: 46.9Reputation: 46.9Reputation: 46.9 add or subtract from this member's reputationadd or subtract from this member's reputation

PostPosted: Fri Jun 17, 2005 17:45    Post subject: Antispam Bandwidth Reply with quote

Ok, so this has been really a popular topic on these forums lately, and I've been investigating a few possibilities.

.htaccess rules based on the blacklist are simply unreasonable (with my host, at least) since you have several thousands of antispam strings, and it would be crazy to make Apache chug through all of those each time there's a referral.

Checking the referring page for a link to your site doesn't always work, since you might have links to your site from mail messages or other hidden things, and also that means that you're delaying loading your page until some other page loads into the get_file_contents call, thus doubling your load time, and triggering a back-button frenzy.

RewriteMap doesn't work, since I can't access my httpd.conf.

Stick this in your conf/hacks.php file, and you're good to go. Very low bandwidth, prevents referrer spiders from getting past the front gate, and generally friendly.
Code:
<?php
/**
 * Bounce all referrers who are blacklisted
 * Isaac Z. Schlueter
 **/
if( !empty($_SERVER['HTTP_REFERER']) && strpos($_SERVER['HTTP_REFERER'],$baseurl) !== 0 )
{
  $is_a_spammer = $DB->get_row("select aspm_ID, aspm_string from  $tableantispam
      where '" . $_SERVER['HTTP_REFERER'] . "' like concat('%',aspm_string,'%')");
  if( $is_a_spammer ) {
    header('HTTP/1.0 403 Forbidden');

    // un-comment the next line of code to redirect back to the referring
    // page. I didn't do this, in the event that perhaps there is a false
    // positive, but you needn't be so kind.
    // In any event, the bandwidth is teensy either way.
    // header('Location: ' . $_SERVER['HTTP_REFERER']);
    ?>
    <html><head><title>Stop Referrer Spam!</title>
    </head><body>
    <p>You are being denied access to this page because you have been referred here by a
      known spammer: [<?php echo $_SERVER['HTTP_REFERER'] ?>].</p>
    <p>If you have reached this page in error, feel free to
      <a href="<?php echo $baseurl . $ReqURL ?>">bypass this message</a> with our
      apologies. Please leave a comment telling us to stop
      blacklisting sites matching [<?php
        echo $is_a_spammer->aspm_string
      ?>] so that this
      doesn't happen again.</p>
    <p>Thank you, and sorry for the inconvenience.</p>
    <p>If, on the other hand, you are a bandwidth-eating referrer spam robot,
      then we hope that your owner dies a painful death and rots in hell,
      and that his or her seed is scrubbed from the face of the earth.</p>
    <p style="text-align:center">--The Management</p>
    </body></html>
    <?php
    die();
  }
}
?>


Last edited by isaac on Sun Jun 19, 2005 21:56; edited 1 time in total
Back to top
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger MSN Messenger
personman
SuperGuru
SuperGuru

Joined: 09 Feb 2005
Posts: 2178
Reputation: 116.9 add or subtract from this member's reputationadd or subtract from this member's reputation
votes: 15

PostPosted: Fri Jun 17, 2005 19:01    Post subject: Reply with quote

I've applied the hack. Thanks. I laughed out loud when I read it.
Back to top
View user's profile Send private message Visit poster's website
ralphy
New Poster
New Poster

Joined: 13 Jun 2005
Posts: 25
Reputation: 11.4 add or subtract from this member's reputationadd or subtract from this member's reputation

PostPosted: Fri Jun 17, 2005 20:59    Post subject: Reply with quote

I haven't done any speed tests, but I wonder how Apache would be slower using an .htaccess file than a PHP script accessing database's tables...

Doesn't using b2evolution's blacklist into an .htaccess file make the b2evolution tests useless?

(Nice message. Wink )
Back to top
View user's profile Send private message Visit poster's website MSN Messenger
BenFranske
Seasoned Poster
Seasoned Poster

Joined: 28 Jun 2004
Posts: 84
Reputation: 54.6Reputation: 54.6Reputation: 54.6Reputation: 54.6Reputation: 54.6 add or subtract from this member's reputationadd or subtract from this member's reputation

PostPosted: Sat Jun 18, 2005 15:03    Post subject: Reply with quote

ralphy wrote:
I haven't done any speed tests, but I wonder how Apache would be slower using an .htaccess file than a PHP script accessing database's tables...


I was wondering about this too. I'd love to find the fastest, most CPU efficient and least bandwidth intensive way of checking for spammers because one of my hosts killed one of my sites due to too many cpu cycles caused by all the page hits from spammers. Anyway, I think that using Apache itself simly must be more efficient than having Apache call php call mySQL for each page hit. Cool idea though. If it's faster to do it this way could someone explain why?
Back to top
View user's profile Send private message Send e-mail Visit poster's website
ralphy
New Poster
New Poster

Joined: 13 Jun 2005
Posts: 25
Reputation: 11.4 add or subtract from this member's reputationadd or subtract from this member's reputation

PostPosted: Sat Jun 18, 2005 22:41    Post subject: Reply with quote

Here are some preliminary page loading speed test results:

Code:

+-------------+----------+----------+----------+----------+
| URL         |   SIZE   |    NO    |  PACKED  |  FULL    |
+-------------+----------+----------+----------+----------+
| Google.com  |     6 KB |   202 ms |     -    |     -    |
| Yahoo.com   |    69 KB | 1,504 ms |     -    |     -    |
+-------------+----------+----------+----------+----------+
| Google.html |     6 KB |     3 ms |   132 ms |   157 ms |
| Yahoo.html  |    69 KB |    14 ms |   142 ms |   174 ms |
+-------------+----------+----------+----------+----------+
| All         |    33 KB | 1,599 ms | 1,859 ms | 1,809 ms |
| Blog        |    35 KB | 1,050 ms | 1,084 ms | 1,230 ms |
| Post        |    35 KB |   627 ms |   711 ms |   737 ms |
+-------------+----------+----------+----------+----------+


In the above table:
-Google.com is the http://www.google.com page
-Yahoo.com is the http://www.yahoo.com page

-Google.html is a local static page with the same content as Google.com
-Yahoo.html is a local static page with the same content as Yahoo.com

-All is a local "All" blogs page
-Blog is another local blog page
-Post is a single post page

-NO means there were no .htaccess file at all
-PACKED means there were a "packed" .htaccess file
-FULL means there were a "full" .htaccess file


Each PACKED and FULL .htaccess files are made of about 1,800 blacklisted strings to be checked.

The PACKED .htaccess file is made with several strings checked at once (with 8 KB lines, Apache doesn't seem to handle longer ones):

Code:
RewriteCond %{HTTP_REFERER} (-4-you\.info|-4u\.net|-adult-|etc.) [NC]
RewriteRule .* - [F]


The FULL htaccess file is made with one string checked per line:
Code:
RewriteCond %{HTTP_REFERER} (-4-you\.info) [NC,OR]
RewriteCond %{HTTP_REFERER} (-4u\.net) [NC,OR]
RewriteCond %{HTTP_REFERER} (-adult-) [NC,OR]
etc.
RewriteRule .* - [F]


Last edited by ralphy on Sat Jun 18, 2005 22:51; edited 1 time in total
Back to top
View user's profile Send private message Visit poster's website MSN Messenger
BenFranske
Seasoned Poster
Seasoned Poster

Joined: 28 Jun 2004
Posts: 84
Reputation: 54.6Reputation: 54.6Reputation: 54.6Reputation: 54.6Reputation: 54.6 add or subtract from this member's reputationadd or subtract from this member's reputation

PostPosted: Sat Jun 18, 2005 22:45    Post subject: Reply with quote

What that tells me is that a lot more time is spent redering the php than reading a full htaccess blacklist so it's probably not a concern as long as it's not too processor intensive. Also, I'd be interesting in finding out how my SetEnvIf... htaccess method is faster or slower than the rewrite method.

Last edited by BenFranske on Sat Jun 18, 2005 22:46; edited 1 time in total
Back to top
View user's profile Send private message Send e-mail Visit poster's website
ralphy
New Poster
New Poster

Joined: 13 Jun 2005
Posts: 25
Reputation: 11.4 add or subtract from this member's reputationadd or subtract from this member's reputation

PostPosted: Sat Jun 18, 2005 22:46    Post subject: Reply with quote

Code:
+-------------+----------+----------+----------+----------+
| URL         |   SIZE   |    NO    |  PACKED  |  FULL    |
+-------------+----------+----------+----------+----------+
| Google.com  |     6 KB |   202 ms |     -    |     -    |
| Yahoo.com   |    69 KB | 1,504 ms |     -    |     -    |
+-------------+----------+----------+----------+----------+
| Google.html |     6 KB |     3 ms |   132 ms |   157 ms |
| Yahoo.html  |    69 KB |    14 ms |   142 ms |   174 ms |
+-------------+----------+----------+----------+----------+
| All         |    33 KB | 1,599 ms | 1,859 ms | 1,809 ms |
| Blog        |    35 KB | 1,050 ms | 1,084 ms | 1,230 ms |
| Post        |    35 KB |   627 ms |   711 ms |   737 ms |
+-------------+----------+----------+----------+----------+


Apparently, .htaccess files save bandwith but cost CPU. It would be interesting to compare the .htaccess- and PHP-based strategies to fight logs/stats spamming in terms of CPU usage.

I'm going to test Kweb's idea about httpd.conf later today:

Kweb wrote:
For those of you who actually manage your own web server, you should note that it is much faster to put this kind of stuff in a <Directory> directive container in the httpd.conf file instead of using .htaccess files. The httpd.conf file is loaded and read once at startup, whereas the .htaccess files are loaded and read for each request.
Back to top
View user's profile Send private message Visit poster's website MSN Messenger
roadies
New Poster
New Poster

Joined: 21 Jul 2003
Posts: 28
Reputation: 41.3Reputation: 41.3Reputation: 41.3Reputation: 41.3 add or subtract from this member's reputationadd or subtract from this member's reputation

PostPosted: Sun Jun 19, 2005 11:40    Post subject: Reply with quote

if only we could block at the switch
_________________
Blog: Jason Murphy
Back to top
View user's profile Send private message
personman
SuperGuru
SuperGuru

Joined: 09 Feb 2005
Posts: 2178
Reputation: 116.9 add or subtract from this member's reputationadd or subtract from this member's reputation
votes: 15

PostPosted: Sun Jun 19, 2005 11:49    Post subject: Reply with quote

How often do you all report spammers to their ISPs and/or web hosts?
Back to top
View user's profile Send private message Visit poster's website
isaac
Hooked :)
Hooked :)

Joined: 03 Dec 2003
Posts: 428
Reputation: 46.9Reputation: 46.9Reputation: 46.9Reputation: 46.9Reputation: 46.9 add or subtract from this member's reputationadd or subtract from this member's reputation

PostPosted: Sun Jun 19, 2005 22:02    Post subject: Reply with quote

I added another check in there so that it doesn't keep you from clicking from your own antispam page (quite annoying!)
Back to top
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger MSN Messenger
ralphy
New Poster
New Poster

Joined: 13 Jun 2005
Posts: 25
Reputation: 11.4 add or subtract from this member's reputationadd or subtract from this member's reputation

PostPosted: Sun Jun 19, 2005 22:38    Post subject: Reply with quote

Two very interesting articles about referrer spam:
Proposal on referrer spam: Background and blacklists
Referrer and Comment Spam: A Primer

The MT-Blacklist plugin spammer blacklist:
MT-Blacklist Master Copy

A PHP script intended to update a .htaccess file with referrer spammers after Apache logs analysis:
Referrer Spam Fucker 3000

Using DNSBL appears to be a nice idea:
MT-DSBL
New anti-spam trick using DSBL

The DNSBL can be used directly from Apache, as explained there:
referrer-b-gone

Finally, a page to test the referrer spam rules (it allows setting a referrer before loading a given page):
WANNABrowser

Also worth to read that discussion about spam fight (especially blocking HTTP_USER_AGENT's):
A Close to perfect .htaccess ban list
Don't miss andreasfriedrich .htaccess and httpd.conf single and multiple RewriteCond benchmarks:
A Close to perfect .htaccess ban list (Page #8)
Back to top
View user's profile Send private message Visit poster's website MSN Messenger
BenFranske
Seasoned Poster
Seasoned Poster

Joined: 28 Jun 2004
Posts: 84
Reputation: 54.6Reputation: 54.6Reputation: 54.6Reputation: 54.6Reputation: 54.6 add or subtract from this member's reputationadd or subtract from this member's reputation

PostPosted: Sun Jun 19, 2005 23:56    Post subject: Reply with quote

I've read several of those articles because referer spam has become a really big problem for me, even so far as to close my site for the time being. Most of the commentators go on about how using htaccess is an unwinnable fight (maybe so if you're doing it by hand, hence my automatic "Refer This!" script, yet they really have no better solution than what I'm doing. That is to block based on a blacklist (in my case the b2 antispam blacklist generates the htaccess file). My qwest is to find the least processor intensive way of dealing with the problem because that's whay i had to pull my site, it was using too many processor cycles when the spambots accessed the page which caused a database lookup. As far as I can tell the best we can do now is to generate a htaccess blacklist based on the database. Of course I will make sure I'm not publishing referer information either. If anyone else has suggestions I'm all ears.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
BenFranske
Seasoned Poster
Seasoned Poster

Joined: 28 Jun 2004
Posts: 84
Reputation: 54.6Reputation: 54.6Reputation: 54.6Reputation: 54.6Reputation: 54.6 add or subtract from this member's reputationadd or subtract from this member's reputation

PostPosted: Sun Jun 19, 2005 23:58    Post subject: Re: Antispam Bandwidth Reply with quote

isaac wrote:
.htaccess rules based on the blacklist are simply unreasonable (with my host, at least) since you have several thousands of antispam strings, and it would be crazy to make Apache chug through all of those each time there's a referral.


Isaac, can you explain to me how a doing a relational database lookup is less processor intensive than the flat file (htaccess) method which only involves Apache? I really do need to find a way to block all these false hits without draining too many resources.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
isaac
Hooked :)
Hooked :)

Joined: 03 Dec 2003
Posts: 428
Reputation: 46.9Reputation: 46.9Reputation: 46.9Reputation: 46.9Reputation: 46.9 add or subtract from this member's reputationadd or subtract from this member's reputation

PostPosted: Mon Jun 20, 2005 8:52    Post subject: Reply with quote

I'm not sure, Ben. In fact, it seems to me that you should be right - .htaccess should be faster.

However, when I put more than about 500 RewriteCond directives in my .htaccess, my site suddenly gets slow as molasses.

There's a lot of reasons why that might be, and I'm not enough of an Apache expert to say why. I know that relational database lookups are really not that slow, depending on a few factors, because of all sorts of indexing and whatnot that goes on in the backend. Then again, I've got more experience with that side of things on MSSQL than on MySQL, so I can't really come up with figures or very sound explanations. On the other hand, Apache must actually read, process, and cache all the RewriteConds until it gets to a RewriteRule. Perhaps packing the rules, and having several Cond/Rule bunches could optimize this somewhat. I'm not sure.

Without any doubt, more research is definitely needed. And, like birth control, I suspect that multiple methods of prevention is the way to go. Smile
Back to top
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger MSN Messenger
ralphy
New Poster
New Poster

Joined: 13 Jun 2005
Posts: 25
Reputation: 11.4 add or subtract from this member's reputationadd or subtract from this member's reputation

PostPosted: Mon Jun 20, 2005 17:03    Post subject: Reply with quote

I used andreasfriedrich's .htaccess vs httpd.conf HTTP_USER_AGENT Blocking Benchmarks figures to display them more readable (click on the image to enlarge):



It appears "Multiple RewriteCond" .htaccess (several conditions packed on the same line using regular expression's OR) appear pretty fine, even if slower than "Simple RewriteCond" httpd.conf (one condition per line).
Back to top
View user's profile Send private message Visit poster's website MSN Messenger
Display posts from previous:   
Post new topic   Reply to topic   printer-friendly view    b2evolution Forum Index -> Fighting spam! All times are GMT - 5 Hours
Goto page 1, 2, 3, 4, 5  Next
Page 1 of 5


 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
b2evolution Support Forum RSS Feed Forums powered by php Bulletin Board