Recent Topics

1 Jun 16, 2005 01:08    

I'm aware there are other anti-spam mechanisms out there, this is based upon one of them. The others do not deal with my problem, my problem is that my blog mostly resides on a few single but large pages, I cannot have these bots downloading 10,000 pages a day, they've already eaten up 4GB of bandwidth this month.

Here is my solution, it's ugly but it works.

This code should be placed at the top of INDEX.PHP


function diee() {
	header('Location: '.$_SERVER['HTTP_REFERER']);
        die('not joking'); //just incase they ignore
}
//Set your sites URL
$checkurl="myurl.com";

if($_SERVER['HTTP_REFERER']) { 
	mysql_connect('localhost','username','password');
	mysql_select_db('Database);
	
	$sql="SELECT * FROM `ref_blacklist` WHERE '".addslashes($_SERVER['HTTP_REFERER'])."'= `ref`";
	$r=mysql_query($sql);
	if(mysql_num_rows($r)>1) { 
		if(mysql_result($r,0,1)=="Y") { 
			diee();
		 }
	 }
	 else {
		 if(strpos(file_get_contents($_SERVER['HTTP_REFERER']),$checkurl)) {
			$sql="INSERT INTO `ref_blacklist` ( `ref`,`spam` ) VALUES ('".addslashes($_SERVER['HTTP_REFERER'])."','N');";
			mysql_query($sql);
			
		}
		
		else {
			$sql="INSERT INTO `ref_blacklist` ( `ref`,`spam` ) VALUES ('".addslashes($_SERVER['HTTP_REFERER'])."','Y');";
			mysql_query($sql);
			diee();
		}
	}
	
		
		
	}

This table should be made in MySQL:


CREATE TABLE `ref_blacklist` (
  `id` int(10) NOT NULL auto_increment,
  `ref` varchar(255) NOT NULL default '',
  `spam` char(1) NOT NULL default '',
  PRIMARY KEY  (`id`)
) TYPE=MyISAM AUTO_INCREMENT=47 ;

Basically, anyone with a referer that isn't in the database will have the website at the address of the referer downloaded, if $checkurl isn't found in the site then the URL will be added to the database as being a spammer, if it is on the site then the URL will be added to the database as not being a spammer.

Spammers will be re-directed (with very little bandwidth use) to their referer, whilst this may not even work on some bots on others it will hopefully screw them up or at least eat up bandwidth for the host they're spamming.

Just thought I'd share :-)
It also eliminates slow loading times - soon all your common referrers will be stored as being legit and users won't have to wait for your webserver to dl their referer before the site loads (this happens with other scripts).

3 Jun 16, 2005 04:16

Wouldn't this mean that you can't email someone your website address? If they click on the link, then they'll get redirected to mail.yahoo.com or wherever. For example, I emailed a friend my url, and I got a referrer hit from [url=http://us.f303.mail.yahoo.com/ym/ShowLetter?MsgId=1058_2078444_2303_1579_9304_0_19243_38363_1481628017&Idx=6&YY=43516&inc=25&order=down&sort=date&pos=0&view=&head=&box=Inbox]http://us.f303.mail.yahoo.com/ym/ShowLetter?MsgId=1058_14&Idx=6&YY=43516&in...&box=Inbox[/url] but when I click on the link, it just goes to a "Your mail session has expired, lease login" page.

I've tried exporting my spam blacklist to a set of RewriteCond/Rule commands, and yes, it certainly does bog down Apache - I got 501 errors all over the place! I think a .php-based rewrite map might do the trick, but I haven't had the time to research it.

4 Jun 16, 2005 06:18

If you have the rights to override Apache behavior using a .htaccess file, you can use the following script to reduce bandith usage by referer spammers.

This is a antispam_generator.php I've just created that creates an .htaccess-like format file using [url=http://b2evolution.net]b2evolution[/url] antispam blacklist:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Block Referer Spam .htaccess Generator for b2evolution</title>
<meta name="robots" content="noindex,nofollow"/>
</head>
<body>

<?php

if( $_SERVER['HTTP_REFERER'] )
{
	die( '<p>This page must be accessed directly (not refered).</p>' );
}

// Default variable values
if( !isset( $host ) ) $host = 'localhost';
if( !isset( $username ) ) $username = 'username';
if( !isset( $password ) ) $password = 'password';
if( !isset( $database ) ) $database = 'database';
if( !isset( $prefix ) ) $prefix = 'evo_';

// Connect to the database
mysql_connect( $host, $username, $password ); 
mysql_select_db( $database );

// Select spamming substrings
$query = "SELECT aspm_string FROM `".$prefix."antispam`";
$result = mysql_query( $query );
if( !$result )
{
	die( '<p>Invalid query: '. mysql_error().'</p>' );
}
$num_rows = mysql_num_rows( $result );
if( $num_rows < 1 )
{
	die( '<p>Empty antispam list</p>' );
}

echo( "<p>Use the following code to create a <em>.htaccess</em> file in your <a href=\"http://b2evolution.net\">b2evolution</a> blogs folder on the server (see your system administrator for Apache configuration to check you can override web server behavior with <em>.htaccess</em> files):</p><hr/>" );
echo( "<code><p>" );
echo( "# Activate rewrite rules<br/>" );
echo( "RewriteEngine On<br/><br/>" );
echo( "# Block referer spam<br/>" );
$displayed_rows = 0;
while( $row = mysql_fetch_row( $result ) )
{
	$referer = preg_replace( "/([\\.\\%])/", "\\\\$1", $row[ 0 ] );
	echo( "RewriteCond %{HTTP_REFERER} (".$referer.") [NC" );
	if( ++$displayed_rows < $num_rows )
	{
		// Do not write the "OR" on the last condition line
		echo( ",OR" );
	}
	echo( "]<br/>" );
}

// Everyone can access the following file (even spammers)
echo( "RewriteCond %{REQUEST_URI} !(antispam\.php) [NC]<br/>" );

// Choose the referer spammer behavior
// (one and only one of the two following lines must be uncommented)
echo( "RewriteRule .* - [F]<br/>" ); // Stop (minimum bandwith usage)
//echo( "RewriteRule .* antispam.php?from=%{HTTP_REFERER}&to=%{REQUEST_URI} [R=302,L]<br/>" ); // Redirect to another page (returning a 'temporary redirect' status)

echo( "</p></code>" );

?>

</body>
</html>

(As you can see in the script, some database login information needs to be supplied.)

In the previous script, use the following code to reduce the bandwith usage to the minimum:

// Choose the referer spammer behavior
echo( "RewriteRule .* - [F]<br/>" ); // Stop (minimum bandwith usage)


or use something more user-friendly like the following code that redirects to an explaination page (antispam.php):

// Choose the referer spammer behavior 
echo( "RewriteRule .* antispam.php?from=%{HTTP_REFERER}&to=%{REQUEST_URI} [R=302,L]<br/>" ); // Redirect to another page (returning a 'temporary redirect' status) 

The previous antispam_generator.php script displays something like this:

# Activate rewrite rules
RewriteEngine On

# Block referer spam
RewriteCond %{HTTP_REFERER} (-adult-) [NC,OR]
RewriteCond %{HTTP_REFERER} (-amateur\.blogspot\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (-billed-) [NC,OR]
RewriteCond %{HTTP_REFERER} (-casino-) [NC,OR]
RewriteCond %{HTTP_REFERER} (_investments\.) [NC]
RewriteCond %{REQUEST_URI} !(antispam\.php) [NC]
RewriteRule .* antispam.php?from=%{HTTP_REFERER}&to=%{REQUEST_URI} [302,L]

(In fact, there are about 1,800 RewriteCond'itions growing the .htaccess file by about 100 KB.)

Copy-past the content to a .htaccess file to copy to your blogs directory.

In the same directory, create an antispam.php file with the following content:

<?php
if( strlen( $_SERVER['HTTP_REFERER'] ) > 0 || $from == '' || $to == '' )
{
	// This page must be accessed directly only
	die();
}
?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Referer Spam Page</title>
<meta name="robots" content="noindex,nofollow"/>
</head>
<body>
<p>This page is displayed because you've been identified as a <a href="http://en.wikipedia.org/wiki/Keyword_spamming">Spam referer</a> pretending coming from <a href="<?php echo( $from ); ?>"><?php echo( $from ); ?></a>. The <a href="http://b2evolution.net">b2evolution</a> antispam blacklist identifies the <a href="<?php echo( $from ); ?>">referer page</a> as spam. Spam sucks. Don't waste our bandwith.</p>
<p>If you are not a spammer, please accept our apologizes and reload <a href="<?php echo( $to ); ?>">your destination page</a>.</p>
</body>
</html>

When a referer spammer is identified, it is redirected to the antispam.php page (in the same directory). When using a standard browser, the redirection makes the referer disappearing. If it is still there, we won't waste more bandwith and don't display anything. Otherwise, a short message is displayed.

It's a bit late, so there might be some bugs (but an early testing procedure makes me pretty confident it works...) I've just updated [url=http://blog.lesperlesduchat.com/]my own site[/url] with the previous system. Wait and see...

5 Jun 16, 2005 06:38

ralphy wrote:

If you have the rights to override Apache behavior using a .htaccess file, you can use the following script to reduce bandith usage by referer spammers.

...

It's a bit late, so there might be some bugs (but an early testing procedure makes me pretty confident it works...) I've just updated [url=http://blog.lesperlesduchat.com/]my own site[/url] with the previous system. Wait and see...

ralphy, talk about duplicating effort. I recently ran into a similar problem with referers sucking up bandwidth and had to take my site offiline for a while. Just tonight I finished hacking out a b2 plugin that generates an .htaccess file using the antispam table. There are a few differences though. I'm not using mod_rewrite because fewer hosts have that enabled than have mod_envif and mod_access. My design also automatically writes the .htaccess file which may make it a bit more attractive to newbies or people who need to update the blocking regularly. It's too late to package this one up tonight but I'll try and publish it tomorrow. Stay tuned.

6 Jun 16, 2005 07:38

A plug-in is always better that an ugly hack. ;)

Anyway, I tried on [url=http://blog.lesperlesduchat.com]my blog[/url] to "pack" the RewriteCond's. In other words, writing this:

RewriteCond %{HTTP_REFERER} (-4-you\.info) [NC,OR]
RewriteCond %{HTTP_REFERER} (-4u\.net) [NC,OR]
RewriteCond %{HTTP_REFERER} (-adult-) [NC,OR]
RewriteCond %{HTTP_REFERER} (-amateur\.blogspot\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (-billed-) [NC,OR]
RewriteCond %{HTTP_REFERER} (-casino-) [NC,OR]


in that way:

RewriteCond %{HTTP_REFERER} (-4u\.net|-adult-|-amateur\.blogspot\.com|-billed-|-casino-) [NC,OR]


For now, I packed about 100 entries per line. The adventage is the whole .htaccess file from about 100 KB to about 33 KB. That also avoids some .htaccess parsing, even if regular expressions are longer, but fewer. I haven't tested yet if this is quicker or not.

Other .htaccess optimizations would be to avoid checking the whole stuff on "trusted" referers, including the host itself (your own blog!) and some other trusted sites. Maybe testing only .php files would speed up the whole thing a bit... I'm probably not used enough with .htaccess files to optimize everything as it could be.

Any volounteer?

7 Jun 17, 2005 16:44

For those of you who actually manage your own web server, you should note that it is much faster to put this kind of stuff in a <Directory> directive container in the httpd.conf file instead of using .htaccess files. The httpd.conf file is loaded and read once at startup, whereas the .htaccess files are loaded and read for each request.

The only catch is that since it is read only at startup, you have to restart Apache for changes to httpd.conf to take effect. Usually, a good practice is to develop your rules in .htaccess, and then when you think you've got them the way you want them, move them to httpd.conf and get the performance gains there.

Of course, I realize that most people don't have access to httpd.conf on their shared hosting systems, but wanted to point that out for those that do.

9 Jul 25, 2005 07:42

farse wrote:

I'm aware there are other anti-spam mechanisms out there, this is based upon one of them. The others do not deal with my problem, my problem is that my blog mostly resides on a few single but large pages, I cannot have these bots downloading 10,000 pages a day, they've already eaten up 4GB of bandwidth this month.

Here is my solution, it's ugly but it works.

This code should be placed at the top of INDEX.PHP

Where exactly? I'm getting a parse error ..unexpected "/"

10 Jul 25, 2005 08:01

I think my htaccess is too big. I'm getting 500 errors. I used Isaac's antispam prune and its still huge. :'(


Form is loading...