| View previous topic :: View next topic |
| Author |
Message |
isaac Hooked :)

Joined: 03 Dec 2003 Posts: 428
     
|
Posted: Fri Jun 17, 2005 17:45 Post subject: Antispam Bandwidth |
|
|
Ok, so this has been really a popular topic on these forums lately, and I've been investigating a few possibilities.
.htaccess rules based on the blacklist are simply unreasonable (with my host, at least) since you have several thousands of antispam strings, and it would be crazy to make Apache chug through all of those each time there's a referral.
Checking the referring page for a link to your site doesn't always work, since you might have links to your site from mail messages or other hidden things, and also that means that you're delaying loading your page until some other page loads into the get_file_contents call, thus doubling your load time, and triggering a back-button frenzy.
RewriteMap doesn't work, since I can't access my httpd.conf.
Stick this in your conf/hacks.php file, and you're good to go. Very low bandwidth, prevents referrer spiders from getting past the front gate, and generally friendly.
| Code: |
<?php
/**
* Bounce all referrers who are blacklisted
* Isaac Z. Schlueter
**/
if( !empty($_SERVER['HTTP_REFERER']) && strpos($_SERVER['HTTP_REFERER'],$baseurl) !== 0 )
{
$is_a_spammer = $DB->get_row("select aspm_ID, aspm_string from $tableantispam
where '" . $_SERVER['HTTP_REFERER'] . "' like concat('%',aspm_string,'%')");
if( $is_a_spammer ) {
header('HTTP/1.0 403 Forbidden');
// un-comment the next line of code to redirect back to the referring
// page. I didn't do this, in the event that perhaps there is a false
// positive, but you needn't be so kind.
// In any event, the bandwidth is teensy either way.
// header('Location: ' . $_SERVER['HTTP_REFERER']);
?>
<html><head><title>Stop Referrer Spam!</title>
</head><body>
<p>You are being denied access to this page because you have been referred here by a
known spammer: [<?php echo $_SERVER['HTTP_REFERER'] ?>].</p>
<p>If you have reached this page in error, feel free to
<a href="<?php echo $baseurl . $ReqURL ?>">bypass this message</a> with our
apologies. Please leave a comment telling us to stop
blacklisting sites matching [<?php
echo $is_a_spammer->aspm_string
?>] so that this
doesn't happen again.</p>
<p>Thank you, and sorry for the inconvenience.</p>
<p>If, on the other hand, you are a bandwidth-eating referrer spam robot,
then we hope that your owner dies a painful death and rots in hell,
and that his or her seed is scrubbed from the face of the earth.</p>
<p style="text-align:center">--The Management</p>
</body></html>
<?php
die();
}
}
?> |
Last edited by isaac on Sun Jun 19, 2005 21:56; edited 1 time in total |
|
| Back to top |
|
 |
personman SuperGuru

 Joined: 09 Feb 2005 Posts: 2178
  votes: 15
|
Posted: Fri Jun 17, 2005 19:01 Post subject: |
|
|
| I've applied the hack. Thanks. I laughed out loud when I read it. |
|
| Back to top |
|
 |
ralphy New Poster

Joined: 13 Jun 2005 Posts: 25
 
|
Posted: Fri Jun 17, 2005 20:59 Post subject: |
|
|
I haven't done any speed tests, but I wonder how Apache would be slower using an .htaccess file than a PHP script accessing database's tables...
Doesn't using b2evolution's blacklist into an .htaccess file make the b2evolution tests useless?
(Nice message. ) |
|
| Back to top |
|
 |
BenFranske Seasoned Poster

Joined: 28 Jun 2004 Posts: 84
     
|
Posted: Sat Jun 18, 2005 15:03 Post subject: |
|
|
| ralphy wrote: |
| I haven't done any speed tests, but I wonder how Apache would be slower using an .htaccess file than a PHP script accessing database's tables... |
I was wondering about this too. I'd love to find the fastest, most CPU efficient and least bandwidth intensive way of checking for spammers because one of my hosts killed one of my sites due to too many cpu cycles caused by all the page hits from spammers. Anyway, I think that using Apache itself simly must be more efficient than having Apache call php call mySQL for each page hit. Cool idea though. If it's faster to do it this way could someone explain why? |
|
| Back to top |
|
 |
ralphy New Poster

Joined: 13 Jun 2005 Posts: 25
 
|
Posted: Sat Jun 18, 2005 22:41 Post subject: |
|
|
Here are some preliminary page loading speed test results:
| Code: |
+-------------+----------+----------+----------+----------+
| URL | SIZE | NO | PACKED | FULL |
+-------------+----------+----------+----------+----------+
| Google.com | 6 KB | 202 ms | - | - |
| Yahoo.com | 69 KB | 1,504 ms | - | - |
+-------------+----------+----------+----------+----------+
| Google.html | 6 KB | 3 ms | 132 ms | 157 ms |
| Yahoo.html | 69 KB | 14 ms | 142 ms | 174 ms |
+-------------+----------+----------+----------+----------+
| All | 33 KB | 1,599 ms | 1,859 ms | 1,809 ms |
| Blog | 35 KB | 1,050 ms | 1,084 ms | 1,230 ms |
| Post | 35 KB | 627 ms | 711 ms | 737 ms |
+-------------+----------+----------+----------+----------+
|
In the above table:
-Google.com is the http://www.google.com page
-Yahoo.com is the http://www.yahoo.com page
-Google.html is a local static page with the same content as Google.com
-Yahoo.html is a local static page with the same content as Yahoo.com
-All is a local "All" blogs page
-Blog is another local blog page
-Post is a single post page
-NO means there were no .htaccess file at all
-PACKED means there were a "packed" .htaccess file
-FULL means there were a "full" .htaccess file
Each PACKED and FULL .htaccess files are made of about 1,800 blacklisted strings to be checked.
The PACKED .htaccess file is made with several strings checked at once (with 8 KB lines, Apache doesn't seem to handle longer ones):
| Code: |
RewriteCond %{HTTP_REFERER} (-4-you\.info|-4u\.net|-adult-|etc.) [NC]
RewriteRule .* - [F] |
The FULL htaccess file is made with one string checked per line:
| Code: |
RewriteCond %{HTTP_REFERER} (-4-you\.info) [NC,OR]
RewriteCond %{HTTP_REFERER} (-4u\.net) [NC,OR]
RewriteCond %{HTTP_REFERER} (-adult-) [NC,OR]
etc.
RewriteRule .* - [F] |
Last edited by ralphy on Sat Jun 18, 2005 22:51; edited 1 time in total |
|
| Back to top |
|
 |
BenFranske Seasoned Poster

Joined: 28 Jun 2004 Posts: 84
     
|
Posted: Sat Jun 18, 2005 22:45 Post subject: |
|
|
What that tells me is that a lot more time is spent redering the php than reading a full htaccess blacklist so it's probably not a concern as long as it's not too processor intensive. Also, I'd be interesting in finding out how my SetEnvIf... htaccess method is faster or slower than the rewrite method.
Last edited by BenFranske on Sat Jun 18, 2005 22:46; edited 1 time in total |
|
| Back to top |
|
 |
ralphy New Poster

Joined: 13 Jun 2005 Posts: 25
 
|
Posted: Sat Jun 18, 2005 22:46 Post subject: |
|
|
| Code: |
+-------------+----------+----------+----------+----------+
| URL | SIZE | NO | PACKED | FULL |
+-------------+----------+----------+----------+----------+
| Google.com | 6 KB | 202 ms | - | - |
| Yahoo.com | 69 KB | 1,504 ms | - | - |
+-------------+----------+----------+----------+----------+
| Google.html | 6 KB | 3 ms | 132 ms | 157 ms |
| Yahoo.html | 69 KB | 14 ms | 142 ms | 174 ms |
+-------------+----------+----------+----------+----------+
| All | 33 KB | 1,599 ms | 1,859 ms | 1,809 ms |
| Blog | 35 KB | 1,050 ms | 1,084 ms | 1,230 ms |
| Post | 35 KB | 627 ms | 711 ms | 737 ms |
+-------------+----------+----------+----------+----------+ |
Apparently, .htaccess files save bandwith but cost CPU. It would be interesting to compare the .htaccess- and PHP-based strategies to fight logs/stats spamming in terms of CPU usage.
I'm going to test Kweb's idea about httpd.conf later today:
| Kweb wrote: |
For those of you who actually manage your own web server, you should note that it is much faster to put this kind of stuff in a <Directory> directive container in the httpd.conf file instead of using .htaccess files. The httpd.conf file is loaded and read once at startup, whereas the .htaccess files are loaded and read for each request.
|
|
|
| Back to top |
|
 |
roadies New Poster

Joined: 21 Jul 2003 Posts: 28
    
|
Posted: Sun Jun 19, 2005 11:40 Post subject: |
|
|
if only we could block at the switch _________________ Blog: Jason Murphy |
|
| Back to top |
|
 |
personman SuperGuru

 Joined: 09 Feb 2005 Posts: 2178
  votes: 15
|
Posted: Sun Jun 19, 2005 11:49 Post subject: |
|
|
| How often do you all report spammers to their ISPs and/or web hosts? |
|
| Back to top |
|
 |
isaac Hooked :)

Joined: 03 Dec 2003 Posts: 428
     
|
Posted: Sun Jun 19, 2005 22:02 Post subject: |
|
|
| I added another check in there so that it doesn't keep you from clicking from your own antispam page (quite annoying!) |
|
| Back to top |
|
 |
ralphy New Poster

Joined: 13 Jun 2005 Posts: 25
 
|
|
| Back to top |
|
 |
BenFranske Seasoned Poster

Joined: 28 Jun 2004 Posts: 84
     
|
Posted: Sun Jun 19, 2005 23:56 Post subject: |
|
|
| I've read several of those articles because referer spam has become a really big problem for me, even so far as to close my site for the time being. Most of the commentators go on about how using htaccess is an unwinnable fight (maybe so if you're doing it by hand, hence my automatic "Refer This!" script, yet they really have no better solution than what I'm doing. That is to block based on a blacklist (in my case the b2 antispam blacklist generates the htaccess file). My qwest is to find the least processor intensive way of dealing with the problem because that's whay i had to pull my site, it was using too many processor cycles when the spambots accessed the page which caused a database lookup. As far as I can tell the best we can do now is to generate a htaccess blacklist based on the database. Of course I will make sure I'm not publishing referer information either. If anyone else has suggestions I'm all ears. |
|
| Back to top |
|
 |
BenFranske Seasoned Poster

Joined: 28 Jun 2004 Posts: 84
     
|
Posted: Sun Jun 19, 2005 23:58 Post subject: Re: Antispam Bandwidth |
|
|
| isaac wrote: |
| .htaccess rules based on the blacklist are simply unreasonable (with my host, at least) since you have several thousands of antispam strings, and it would be crazy to make Apache chug through all of those each time there's a referral. |
Isaac, can you explain to me how a doing a relational database lookup is less processor intensive than the flat file (htaccess) method which only involves Apache? I really do need to find a way to block all these false hits without draining too many resources. |
|
| Back to top |
|
 |
isaac Hooked :)

Joined: 03 Dec 2003 Posts: 428
     
|
Posted: Mon Jun 20, 2005 8:52 Post subject: |
|
|
I'm not sure, Ben. In fact, it seems to me that you should be right - .htaccess should be faster.
However, when I put more than about 500 RewriteCond directives in my .htaccess, my site suddenly gets slow as molasses.
There's a lot of reasons why that might be, and I'm not enough of an Apache expert to say why. I know that relational database lookups are really not that slow, depending on a few factors, because of all sorts of indexing and whatnot that goes on in the backend. Then again, I've got more experience with that side of things on MSSQL than on MySQL, so I can't really come up with figures or very sound explanations. On the other hand, Apache must actually read, process, and cache all the RewriteConds until it gets to a RewriteRule. Perhaps packing the rules, and having several Cond/Rule bunches could optimize this somewhat. I'm not sure.
Without any doubt, more research is definitely needed. And, like birth control, I suspect that multiple methods of prevention is the way to go.  |
|
| Back to top |
|
 |
ralphy New Poster

Joined: 13 Jun 2005 Posts: 25
 
|
Posted: Mon Jun 20, 2005 17:03 Post subject: |
|
|
I used andreasfriedrich's .htaccess vs httpd.conf HTTP_USER_AGENT Blocking Benchmarks figures to display them more readable (click on the image to enlarge):
It appears "Multiple RewriteCond" .htaccess (several conditions packed on the same line using regular expression's OR) appear pretty fine, even if slower than "Simple RewriteCond" httpd.conf (one condition per line). |
|
| Back to top |
|
 |
|