Recent Topics

1 Jun 18, 2005 00:45    

Ok, so this has been really a popular topic on these forums lately, and I've been investigating a few possibilities.

.htaccess rules based on the blacklist are simply unreasonable (with my host, at least) since you have several thousands of antispam strings, and it would be crazy to make Apache chug through all of those each time there's a referral.

Checking the referring page for a link to your site doesn't always work, since you might have links to your site from mail messages or other hidden things, and also that means that you're delaying loading your page until some other page loads into the get_file_contents call, thus doubling your load time, and triggering a back-button frenzy.

RewriteMap doesn't work, since I can't access my httpd.conf.

Stick this in your conf/hacks.php file, and you're good to go. Very low bandwidth, prevents referrer spiders from getting past the front gate, and generally friendly.

<?php
/**
 * Bounce all referrers who are blacklisted
 * Isaac Z. Schlueter
 **/
if( !empty($_SERVER['HTTP_REFERER']) && strpos($_SERVER['HTTP_REFERER'],$baseurl) !== 0 )
{
  $is_a_spammer = $DB->get_row("select aspm_ID, aspm_string from  $tableantispam 
      where '" . $_SERVER['HTTP_REFERER'] . "' like concat('%',aspm_string,'%')");
  if( $is_a_spammer ) {
    header('HTTP/1.0 403 Forbidden');

    // un-comment the next line of code to redirect back to the referring
    // page. I didn't do this, in the event that perhaps there is a false
    // positive, but you needn't be so kind.
    // In any event, the bandwidth is teensy either way.
    // header('Location: ' . $_SERVER['HTTP_REFERER']);
    ?>
    <html><head><title>Stop Referrer Spam!</title>
    </head><body>
    <p>You are being denied access to this page because you have been referred here by a 
      known spammer: [<?php echo $_SERVER['HTTP_REFERER'] ?>].</p>
    <p>If you have reached this page in error, feel free to
      <a href="<?php echo $baseurl . $ReqURL ?>">bypass this message</a> with our
      apologies. Please leave a comment telling us to stop 
      blacklisting sites matching [<?php 
        echo $is_a_spammer->aspm_string 
      ?>] so that this
      doesn't happen again.</p>
    <p>Thank you, and sorry for the inconvenience.</p>
    <p>If, on the other hand, you are a bandwidth-eating referrer spam robot,
      then we hope that your owner dies a painful death and rots in hell, 
      and that his or her seed is scrubbed from the face of the earth.</p>
    <p style="text-align:center">--The Management</p>
    </body></html>
    <?php
    die();
  }
}
?>

2 Jun 18, 2005 02:01

I've applied the hack. Thanks. I laughed out loud when I read it.

3 Jun 18, 2005 03:59

I haven't done any speed tests, but I wonder how Apache would be slower using an .htaccess file than a PHP script accessing database's tables...

Doesn't using b2evolution's blacklist into an .htaccess file make the b2evolution tests useless?

(Nice message. ;) )

4 Jun 18, 2005 22:03

ralphy wrote:

I haven't done any speed tests, but I wonder how Apache would be slower using an .htaccess file than a PHP script accessing database's tables...

I was wondering about this too. I'd love to find the fastest, most CPU efficient and least bandwidth intensive way of checking for spammers because one of my hosts killed one of my sites due to too many cpu cycles caused by all the page hits from spammers. Anyway, I think that using Apache itself simly must be more efficient than having Apache call php call mySQL for each page hit. Cool idea though. If it's faster to do it this way could someone explain why?

5 Jun 19, 2005 05:41

Here are some preliminary page loading speed test results:


+-------------+----------+----------+----------+----------+
| URL         |   SIZE   |    NO    |  PACKED  |  FULL    |
+-------------+----------+----------+----------+----------+
| Google.com  |     6 KB |   202 ms |     -    |     -    |
| Yahoo.com   |    69 KB | 1,504 ms |     -    |     -    |
+-------------+----------+----------+----------+----------+
| Google.html |     6 KB |     3 ms |   132 ms |   157 ms |
| Yahoo.html  |    69 KB |    14 ms |   142 ms |   174 ms |
+-------------+----------+----------+----------+----------+
| All         |    33 KB | 1,599 ms | 1,859 ms | 1,809 ms |
| Blog        |    35 KB | 1,050 ms | 1,084 ms | 1,230 ms |
| Post        |    35 KB |   627 ms |   711 ms |   737 ms |
+-------------+----------+----------+----------+----------+

In the above table:
-Google.com is the http://www.google.com page
-Yahoo.com is the http://www.yahoo.com page

-Google.html is a local static page with the same content as Google.com
-Yahoo.html is a local static page with the same content as Yahoo.com

-All is a local "All" blogs page
-Blog is another local blog page
-Post is a single post page

-NO means there were no .htaccess file at all
-PACKED means there were a "packed" .htaccess file
-FULL means there were a "full" .htaccess file

Each PACKED and FULL .htaccess files are made of about 1,800 blacklisted strings to be checked.

The PACKED .htaccess file is made with several strings checked at once (with 8 KB lines, Apache doesn't seem to handle longer ones):

RewriteCond %{HTTP_REFERER} (-4-you\.info|-4u\.net|-adult-|etc.) [NC]
RewriteRule .* - [F]

The FULL htaccess file is made with one string checked per line:

RewriteCond %{HTTP_REFERER} (-4-you\.info) [NC,OR]
RewriteCond %{HTTP_REFERER} (-4u\.net) [NC,OR]
RewriteCond %{HTTP_REFERER} (-adult-) [NC,OR]
etc.
RewriteRule .* - [F]

6 Jun 19, 2005 05:45

What that tells me is that a lot more time is spent redering the php than reading a full htaccess blacklist so it's probably not a concern as long as it's not too processor intensive. Also, I'd be interesting in finding out how my SetEnvIf... htaccess method is faster or slower than the rewrite method.

7 Jun 19, 2005 05:46

+-------------+----------+----------+----------+----------+
| URL         |   SIZE   |    NO    |  PACKED  |  FULL    |
+-------------+----------+----------+----------+----------+
| Google.com  |     6 KB |   202 ms |     -    |     -    |
| Yahoo.com   |    69 KB | 1,504 ms |     -    |     -    |
+-------------+----------+----------+----------+----------+
| Google.html |     6 KB |     3 ms |   132 ms |   157 ms |
| Yahoo.html  |    69 KB |    14 ms |   142 ms |   174 ms |
+-------------+----------+----------+----------+----------+
| All         |    33 KB | 1,599 ms | 1,859 ms | 1,809 ms |
| Blog        |    35 KB | 1,050 ms | 1,084 ms | 1,230 ms |
| Post        |    35 KB |   627 ms |   711 ms |   737 ms |
+-------------+----------+----------+----------+----------+

Apparently, .htaccess files save bandwith but cost CPU. It would be interesting to compare the .htaccess- and PHP-based strategies to fight logs/stats spamming in terms of CPU usage.

I'm going to test [url=http://forums.b2evolution.net/viewtopic.php?t=4483#21324]Kweb's idea about httpd.conf[/url] later today:

Kweb wrote:

For those of you who actually manage your own web server, you should note that it is much faster to put this kind of stuff in a <Directory> directive container in the httpd.conf file instead of using .htaccess files. The httpd.conf file is loaded and read once at startup, whereas the .htaccess files are loaded and read for each request.

8 Jun 19, 2005 18:40

if only we could block at the switch

9 Jun 19, 2005 18:49

How often do you all report spammers to their ISPs and/or web hosts?

10 Jun 20, 2005 05:02

I added another check in there so that it doesn't keep you from clicking from your own antispam page (quite annoying!)

11 Jun 20, 2005 05:38

Two very interesting articles about referrer spam:
[url=http://underscorebleach.net/jotsheet/2005/01/referrer-spam-proposal]Proposal on referrer spam: Background and blacklists[/url]
[url=http://blog.centresource.com/2005/04/30/referrercomment-spam/]Referrer and Comment Spam: A Primer[/url]

The MT-Blacklist plugin spammer blacklist:
[url=http://www.jayallen.org/comment_spam/blacklist.txt]MT-Blacklist Master Copy[/url]

A PHP script intended to update a .htaccess file with referrer spammers after Apache logs analysis:
[url=http://g-blog.net/user/Gossip/entry/15472]Referrer Spam Fucker 3000[/url]

Using [url=http://en.wikipedia.org/wiki/DNSBL]DNSBL[/url] appears to be a nice idea:
[url=http://bradchoate.com/weblog/2004/11/05/mt-dsbl]MT-DSBL[/url]
[url=http://weblog.sinteur.com/index.php?p=7967]New anti-spam trick[/url] using [url=http://dsbl.org/]DSBL[/url]

The [url=http://en.wikipedia.org/wiki/DNSBL]DNSBL[/url] can be used directly from Apache, as explained there:
[url=http://chris.quietlife.net/?p=439]referrer-b-gone[/url]

Finally, a page to test the referrer spam rules (it allows setting a referrer before loading a given page):
[url=http://www.wannabrowser.com/]WANNABrowser[/url]

Also worth to read that discussion about spam fight (especially blocking HTTP_USER_AGENT's):
[url=http://www.webmasterworld.com/forum13/687.htm]A Close to perfect .htaccess ban list[/url]
Don't miss andreasfriedrich .htaccess and httpd.conf single and multiple RewriteCond benchmarks:
[url=http://www.webmasterworld.com/forum13/687-8-10.htm]A Close to perfect .htaccess ban list (Page #8)[/url]

12 Jun 20, 2005 06:56

I've read several of those articles because referer spam has become a really big problem for me, even so far as to close my site for the time being. Most of the commentators go on about how using htaccess is an unwinnable fight (maybe so if you're doing it by hand, hence my automatic "Refer This!" script, yet they really have no better solution than what I'm doing. That is to block based on a blacklist (in my case the b2 antispam blacklist generates the htaccess file). My qwest is to find the least processor intensive way of dealing with the problem because that's whay i had to pull my site, it was using too many processor cycles when the spambots accessed the page which caused a database lookup. As far as I can tell the best we can do now is to generate a htaccess blacklist based on the database. Of course I will make sure I'm not publishing referer information either. If anyone else has suggestions I'm all ears.

13 Jun 20, 2005 06:58

isaac wrote:

.htaccess rules based on the blacklist are simply unreasonable (with my host, at least) since you have several thousands of antispam strings, and it would be crazy to make Apache chug through all of those each time there's a referral.

Isaac, can you explain to me how a doing a relational database lookup is less processor intensive than the flat file (htaccess) method which only involves Apache? I really do need to find a way to block all these false hits without draining too many resources.

14 Jun 20, 2005 15:52

I'm not sure, Ben. In fact, it seems to me that you should be right - .htaccess should be faster.

However, when I put more than about 500 RewriteCond directives in my .htaccess, my site suddenly gets slow as molasses.

There's a lot of reasons why that might be, and I'm not enough of an Apache expert to say why. I know that relational database lookups are really not that slow, depending on a few factors, because of all sorts of indexing and whatnot that goes on in the backend. Then again, I've got more experience with that side of things on MSSQL than on MySQL, so I can't really come up with figures or very sound explanations. On the other hand, Apache must actually read, process, and cache all the RewriteConds until it gets to a RewriteRule. Perhaps packing the rules, and having several Cond/Rule bunches could optimize this somewhat. I'm not sure.

Without any doubt, more research is definitely needed. And, like birth control, I suspect that multiple methods of prevention is the way to go. :)

15 Jun 21, 2005 00:03

I used [url=http://www.webmasterworld.com/forum13/687-8-10.htm]andreasfriedrich[/url]'s .htaccess vs httpd.conf HTTP_USER_AGENT Blocking Benchmarks figures to display them more readable (click on the image to enlarge):

[url=http://blog.lesperlesduchat.com/media/external/lpdc_htaccess_httpd_conf_http_user_agent_blocking_benchmarks.png]http://blog.lesperlesduchat.com/media/external/lpdc_htaccess_httpd_conf_http_user_agent_blocking_benchmarks_thumb.png[/url]

It appears "Multiple RewriteCond" .htaccess (several conditions packed on the same line using regular expression's OR) appear pretty fine, even if slower than "Simple RewriteCond" httpd.conf (one condition per line).

16 Jun 21, 2005 00:43

Has anyone seen any speed comparisons using the htaccess file but NOT mod_rewrite? I want to try and use the SetEnvIfNoCase method (the way Spam F'er 3000 does it) instead. IIRC mod_rewrite uses a lot more overhead.

17 Jun 21, 2005 01:44

I benchmarked the [url=http://weblog.sinteur.com/index.php?p=7967]New anti-spam trick[/url] about using the [url=http://dsbl.org/]Distributed Sender Blackhole List[/url] services.

The BlockUntrustedVisitors function (see below) takes approximatively 0,86 to 1,08 ms per call with an average of about 1 ms on my server (that speed may vary a lot depending of your server ping and bandwith):



function BlockUntrustedVisitors()
{
    $VisitorIP = $_SERVER[ 'REMOTE_ADDR' ];
    list( $a, $b, $c, $d ) = explode( ".", $VisitorIP );
    if( gethostbyname( "$d.$c.$b.$a.list.dsbl.org" ) != "$d.$c.$b.$a.list.dsbl.org" )
    {
        // Not trusted
        header( "Location: http://dsbl.org/listing?".$VisitorIP );
        die();
    }
}

It's nothing compared to 737 to 1,809 ms per displayed b2evolution page on my site! So, it appears to be quick enough to call it before every page displayed.

18 Jun 21, 2005 03:29

I think we may have to combine methods because I don't know that too many referrer spammers are going to be listed in an IP based open-relay blacklist. I do like that it's so fast though.

19 Jun 21, 2005 03:56

After reading this I've been working on reducing the number of items in the antispam blacklist. There are a lot of redundant entries, so I've been removing all the entries that are covered by more broad entries. This won't help you much, since you're already got the updates, but I'm hoping that fresh installs won't have as long a list, even after updating. You can go through your list and remove redundant blacklist items by clicking on the Allow link.

20 Jun 21, 2005 04:00

That brings up a good point though. At some point down the road we may want to have a method of removing blacklist entries via an update as well.

21 Jun 21, 2005 04:32

While we're dreaming, it would be nice if the blacklist could include regular expressions. As it is now, there's no wildcards at all.

22 Jun 21, 2005 07:37

personman wrote:

After reading this I've been working on reducing the number of items in the antispam blacklist. There are a lot of redundant entries, so I've been removing all the entries that are covered by more broad entries. This won't help you much, since you're already got the updates, but I'm hoping that fresh installs won't have as long a list, even after updating. You can go through your list and remove redundant blacklist items by clicking on the Allow link.

While I've been cleaning up my local b2evolution blacklist, I noticed some entries could be removed automatically by testing substrings. If an entry is part of other entries, then remove all the other entries.

It becomes reasonable to include the [url=http://www.jayallen.org/comment_spam/blacklist.txt]MT-Blacklist Master Copy[/url] while avoiding duplicates.

personman wrote:

While we're dreaming, it would be nice if the blacklist could include regular expressions. As it is now, there's no wildcards at all.

I haven't checked the code yet... If it's not the case, it should be.

BenFranske wrote:

That brings up a good point though. At some point down the road we may want to have a method of removing blacklist entries via an update as well.

Maybe would it be interesting to keep a "spammer log", in order to check if a spammer is still active and a given entry in the blacklist can be removed or not.

23 Jun 21, 2005 08:03

BenFranske wrote:

I think we may have to combine methods because I don't know that too many referrer spammers are going to be listed in an IP based open-relay blacklist. I do like that it's so fast though.

[url=http://dsbl.org/listing]dsbl.org[/url] is aimed for e-mail servers. However, it includes unsecure machines' IPs, including open proxies. That should help blocking some spammers.

[url=http://bradchoate.com/projects/spamlookup/]SpamLookup[/url] (a "Movable Type plugin for identifying and eliminating weblog spam") uses [url=http://bradchoate.com/projects/spamlookup/wiki/SpamIdentification]several techniques to identify spams[/url]. The IP-based identification uses both [url=http://bsb.empty.us]bsb.empty.us[/url] ("The BSB is a database of IP addresses that have sent "comment spam") and [url=http://opm.blitzed.org]opm.blitzed.org[/url] ("The Blitzed Open Proxy Monitor List is a DNS-based list of machines believed to run insecure proxies. These proxies have been abused in the past to either send spam or connect to an IRC network running a version of Blitzed Open Proxy Monitor, such as Blitzed").

The [url=http://www.jayallen.org/comment_spam/]MT-Blacklist/Comment Spam Clearinghouse[/url] page declares about [url=http://bradchoate.com/projects/spamlookup/]SpamLookup[/url]:

Suffice it to say, in looking over my logs every day, SpamLookup blocked about 95% of my spam, moderated the other 4.9999999% and blocked exactly zero real comments and TrackBacks. There has been only ONE spam that has reached my server since I first installed it and I'm pretty sure that that must have been hand-entered. If we've made them resort to hand-entering spam, we've won the war.

The war is probably not won, but we could win a battle...

24 Jun 21, 2005 18:15

There's an unintended side-effect of this hack. When I click on re-check in the antispam tab for an item, it takes me to the abuse page.

25 Jun 21, 2005 19:20

personman,

I ran into that, too. I corrected it in the first post in this thread. Just grab the code again, and you'll be fine. Now it checks if your baseurl is at the start of the referrer, and if so, skips the whole deal.

26 Jun 21, 2005 20:30

That fixed it. Thanks.

27 Jun 21, 2005 21:06

Check out http://isaacschlueter.com/download/b2antispam_genhtaccess.php

This script allows you to set how many spammers should go in a single regex, along with whether to use a single RewriteRule with several RewriteCond's or a separate RewriteRule for each RewriteCond.

Also, it generates rules to use for EnvIfNoCase, to set referer_spam to true.

I'm not sure which one would be faster - many many Conds with one Rule, a single Cond/Rule pair, SetEnv with a Deny/Allow directive, etc. I haven't tested these out, and don't really have the time/resources to adequately do so, so if one of you fine folks want to take that on, be my guest.

I figured out why I was getting http500 errors doing this the last time I tried it - I wasn't escaping all the special chars that sometimes show up in the aspm_string field. :oops: Also, I had been using a separate RewriteRule for each one, like this:

RewriteCond %{HTTP_REFERER} *.ASPM_STRING_FIELD_FROM_DB*.
RewriteRule .* - [F]

The .htaccess file was gigantic. I've read somewhere in all this that a really big .htaccess file will slow things down, because Apache has to load the whole thing into memory before doing anything else. That could have just been someone's theory, though, and I'm really not sure why I had such problems. (Could have just been my host was in a bad mood or something.)

I'm not sure whether mod_rewrite or mod_setenvif will be faster, but they each have their strengths. With mod_rewrite you can send spammers to a special page or something, but with mod_setenvif you can set an environment variable. If you have a php ErrorDocument for 403s, or a [url=http://isaacschlueter.com/error/http_200]blog devoted to that purpose[/url], then you can check that environment variable and provide some help for possible false positives.

When I decide which one would be better, I'll write up a cronnable script that will write the antispam rules to your .htaccess. What do you all think?

28 Jun 21, 2005 23:33

isaac wrote:

I'm not sure which one would be faster - many many Conds with one Rule, a single Cond/Rule pair, SetEnv with a Deny/Allow directive, etc. I haven't tested these out, and don't really have the time/resources to adequately do so, so if one of you fine folks want to take that on, be my guest.
[...]
When I decide which one would be better, I'll write up a cronnable script that will write the antispam rules to your .htaccess. What do you all think?

My own benchmarks:

+-------------+----------+----------+----------+----------+
| URL         |   SIZE   |    NO    |  PACKED  |  FULL    |
+-------------+----------+----------+----------+----------+
| Google.com  |     6 KB |   202 ms |     -    |     -    |
| Yahoo.com   |    69 KB | 1,504 ms |     -    |     -    |
+-------------+----------+----------+----------+----------+
| Google.html |     6 KB |     3 ms |   132 ms |   157 ms |
| Yahoo.html  |    69 KB |    14 ms |   142 ms |   174 ms |
+-------------+----------+----------+----------+----------+
| All         |    33 KB | 1,599 ms | 1,859 ms | 1,809 ms |
| Blog        |    35 KB | 1,050 ms | 1,084 ms | 1,230 ms |
| Post        |    35 KB |   627 ms |   711 ms |   737 ms |
+-------------+----------+----------+----------+----------+

as well as [url=http://www.webmasterworld.com/forum13/687-8-10.htm]andreasfriedrich[/url]'s benchmark:

[url=http://blog.lesperlesduchat.com/media/external/lpdc_htaccess_httpd_conf_http_user_agent_blocking_benchmarks.png]http://blog.lesperlesduchat.com/media/external/lpdc_htaccess_httpd_conf_http_user_agent_blocking_benchmarks_thumb.png[/url]

follow to the same conclusion: more you use regular expression conditions in your RewriteCond with .htaccess, better it is. However, that appears not true for httpd.conf.

I understand SetEnvIfNoCase does not use the same module as RewriteCond, but both behave in the same way: they identify a referrer using regular expressions (one per line or packed together) and once identified, they react in a given manner. I don't see why the speed would be different in both cases from the algorithmic point of view.

My own script packs as much RewriteCond'itions as possible (Apache supports up to 8192 bytes per .htaccess line):


if( $_SERVER['HTTP_REFERER'] )
{
	die( '<p>This page must be accessed directly (not refered).</p>' );
}

// Default variable values
if( !isset( $host ) ) $host = 'localhost';
if( !isset( $username ) ) $username = 'username';
if( !isset( $password ) ) $password = 'password';
if( !isset( $database ) ) $database = 'database';
if( !isset( $prefix ) ) $prefix = 'evo_';

// Connect to the database
mysql_connect( $host, $username, $password ); 
mysql_select_db( $database );

// Select spamming substrings
$query = "SELECT aspm_string FROM `".$prefix."antispam`";
$result = mysql_query( $query );
if( !$result )
{
	die( '<p>Invalid query: '. mysql_error().'</p>' );
}
$num_rows = mysql_num_rows( $result );
if( $num_rows < 1 )
{
	die( '<p>Empty antispam list</p>' );
}

echo( "<p>Use the following code to create a <em>.htaccess</em> file in your <a href=\"http://b2evolution.net\">b2evolution</a> blogs folder on the server (see your system administrator for Apache configuration to check you can override web server behavior with <em>.htaccess</em> files):</p><hr/>" );
echo( "<code><p>" );
echo( "# Activate rewrite rules<br/>" );
echo( "RewriteEngine On<br/><br/>" );
echo( "# Block referer spam<br/>" );



$LineLengthMax = 8192; // Maximum line length (Apache .htaccess limitation is 8192)
$LineStart = "RewriteCond %{HTTP_REFERER} (";
$LineLengthStart = strlen( $LineStart );
$LineSeparator = ""; // The initial value of the separator is volountarily left empty
$LineLengthSeparator = 0;
$LineNext = ") [NC,OR]<br/>"; // In fact the "<br/>" tag is longer than a "\n" character, but that won't change a lot here
$LineLengthNext = strlen( $LineNext );
$LineFinal = ") [NC]<br/>"; // Must be shorter than $LineNext!

$Line = $LineStart;
$LineLength = strlen( $Line ) + $LineLengthNext;

while( $NewEntry = mysql_fetch_row( $result ) )
{
	// Format conversion
	$NewEntry = preg_replace( "/([\\.\\%])/", "\\\\$1", $NewEntry[ 0 ] );
	
	// What's the length of the new entry?
	$LineLengthNew = strlen( $NewEntry );

	// What would be the total length of the current line if appended the new entry?
	$LineLength += $LineLengthNew + $LineLengthSeparator;

	// Is the total length short enough?
	if( $LineLength < $LineLengthMax )
	{
		// Yes, we can append the new entry to the current line
		$Line .= $LineSeparator . $NewEntry;

		$LineSeparator = "|";
		$LineLengthSeparator = 1; //strlen( $LineSeparator );
	}
	else
	{
		// No, we have to terminate current line and begin a new one

		// Terminate the current line
		$Line .= $LineNext;

		// Flush the current line
		echo( $Line );

		// Begin a new line
		$Line = $LineStart . $NewEntry;
		$LineLength = $LineLengthStart + $LineLengthNew + $LineLengthNext;
	}
}

// Close the final line
$Line .= $LineFinal;
echo( $Line );

// Everyone can access the following file (even spammers)
echo( "RewriteCond %{REQUEST_URI} !(antispam\.php) [NC]<br/>" );

// Choose the referer spammer behavior
// (one and only one of the two following lines must be uncommented)
echo( "#RewriteRule .* - [F]<br/>" ); // Stop (minimum bandwith usage)
echo( "RewriteRule .* antispam.php?from=%{HTTP_REFERER}&to=%{REQUEST_URI} [R=302,L]<br/>" ); // Redirect to another page (returning a 'temporary redirect' status)

echo( "</p></code>" );

?>

The above script can be run at the following page:
http://blog.lesperlesduchat.com/antispam_generator.php

29 Jun 22, 2005 00:11

isaac wrote:

Check out http://isaacschlueter.com/tests/b2antispam_htaccess.php

This script allows you to set how many spammers should go in a single regex, along with whether to use a single RewriteRule with several RewriteCond's or a separate RewriteRule for each RewriteCond.

Also, it generates rules to use for EnvIfNoCase, to set referer_spam to true.

I'm not sure which one would be faster - many many Conds with one Rule, a single Cond/Rule pair, SetEnv with a Deny/Allow directive, etc. I haven't tested these out, and don't really have the time/resources to adequately do so, so if one of you fine folks want to take that on, be my guest.

...

I'm not sure whether mod_rewrite or mod_setenvif will be faster, but they each have their strengths. With mod_rewrite you can send spammers to a special page or something, but with mod_setenvif you can set an environment variable. If you have a php ErrorDocument for 403s, or a [url=http://isaacschlueter.com/error/http_200]blog devoted to that purpose[/url], then you can check that environment variable and provide some help for possible false positives.

When I decide which one would be better, I'll write up a cronnable script that will write the antispam rules to your .htaccess. What do you all think?

You don't want to put both the RewriteCond's and the SetEnvIfs in the same htaccess file because that would make it really really huge and slow (hopefully I just mis-read that.

Regarding the cronnable script. I would appreciate it if you could call the script without a local cronjob. I do most of my hosting on shared servers and don't have access to cron; however I do have cron access on my dev servers so my preferred method is to be able to create a cron job on the dev server something like 'wget http://myproductionserver.com/blogs/admin/make_htaccess.php?actio=doit' or something like that, you get the idea. This means I can execute the job from another server which makes life much easier.

30 Jun 22, 2005 00:18

ralphy wrote:

I understand SetEnvIfNoCase does not use the same module as RewriteCond, but both behave in the same way: they identify a referrer using regular expressions (one per line or packed together) and once identified, they react in a given manner. I don't see why the speed would be different in both cases from the algorithmic point of view.

My hypothesis would be because they (the two different modules) were (probably) coded by two differnet groups thay may have used different methods/algorithms or contain different bugs which could result in different speeds. It wouldn't be the first time. I think it's worth testing if we can.

Perhaps we even want to support both methods and making a choice between them for people that do or do not have mod_rewrite access, maybe even the third method of cheking on each page load (the early hacks.php hack) for people who can't use htaccess files on thier host. Once we develop the methods and alorithm we're going to use it shouldn't be difficult to support multiple ways of achieving the result.

I am by no means a programmer though I understand most of it, but if you need help beta testing or looking at different ways of doing things I'm your man. We also might want to keep in mind whatever changes are in the CVS pipe for 0.9.2. The only reason I didn't do my initial work in that platform was the antispam things seemed to still be up in the air.

31 Jun 22, 2005 11:19

BenFranske wrote:

My hypothesis would be because they (the two different modules) were (probably) coded by two differnet groups thay may have used different methods/algorithms or contain different bugs which could result in different speeds. It wouldn't be the first time. I think it's worth testing if we can.

You're right. It's worth testing.

BenFranske wrote:

Perhaps we even want to support both methods and making a choice between them for people that do or do not have mod_rewrite access, maybe even the third method of cheking on each page load (the early hacks.php hack) for people who can't use htaccess files on thier host. Once we develop the methods and alorithm we're going to use it shouldn't be difficult to support multiple ways of achieving the result.

I agree once again.

BenFranske wrote:

I am by no means a programmer though I understand most of it, but if you need help beta testing or looking at different ways of doing things I'm your man. We also might want to keep in mind whatever changes are in the CVS pipe for 0.9.2. The only reason I didn't do my initial work in that platform was the antispam things seemed to still be up in the air.

It is interesting to notice several people here were talking about several different ways to reduce spam effects on their blogs and servers. It would be interesting to make a point. Making a strong antispam plug-in to b2evolution with all the options listed here and there would be extremly usefull for some users.

It appears referral spamming is the main issue for some people seriously annoyed by some heavy spam attacks... However, comments, ping and trackback spam should not be underestimated for the future.

On my own blog, my .htaccess based on b2evolution's antispam blacklist redirects about 2,5% of the whole traffic of my site (in terms of page views) to my referral spammers page. It would also be interesting to see how much spammers appear to be on other blogs.

32 Jun 22, 2005 15:20

Ben,

It'll probably be something like the scripts that I use to update and recheck my blacklist. http://isaacschlueter.com/admin/b2antispam_recheck.php and http://isaacschlueter.com/admin/b2antispam_poll.php

I basically just trim out everything from the antispam page except for the bare essentials. I took out the login requirement, too - since i'm cronning it for every 6 hours, who cares if some random person makes my page update the blacklist a little more often?

The advantage of this is that you can trigger the script to do its thing by either cronning it locally to be parsed by the php processor, or by using wget from another server, or even setting up a scheduled task in Windows to open up IE and have it load a particular address. I know that the http spec doesn't recommend using GET for anything that takes action or changes things, but ya know, this is just extremely convenient! :)

33 Jun 22, 2005 23:30

Yep, something like that was what I had in mind.

Does anyone have any insight as to what's going on with the antispam features in 0.9.2? Last time I checked these were still unfinished in CVS.

34 Jun 23, 2005 19:17

Check this out:
http://isaacschlueter.com/tests/b2antispam_genhtaccess.php.txt

Save it in your admin folder, strip the .txt off of the filename, and run it.
Unless you set the $do_write option to true, it won't actually write anything, but will just tell you what it WOULD have done. If it's what you want, you can set it to true.

Other than that, just follow the instructions.

Thoughts? Comments?

35 Jun 24, 2005 00:47

I had to make a few changes (notable adding quotes around the regular expression) to keep my server from giving a 500 error. I also added a sample $rule for people without mod_rewrite.

Build 2 is available at http://t1.franske.com/cjmedia/b2antispam_genhtaccess.php.txt

Wishlist: Allow the creation of multiple .htaccess files at once... I have b2 sites in several directories and am using different virtual hosts for each but they use the same backend b2 and as such I need to generate several htaccess files (one in each directory) at once.

36 Jun 24, 2005 01:09

Thanks, ben :)

I also had to add a bit to turn \n's into windows-style \r\n's, or else I ended up with some screwey rules.

I might be able to implement that wishlist, and have a new version this weekend sometime, but it depends on when I end up seeing batman begins :) It wouldn't be terribly hard. Just define the htaccess files as an array, and then loop through the whole thing with a foreach. (I'm assuming that you'd be able to have write access to all of them from the same admin url. Or is it trickier than that?)

EDIT:
Whoops, my proposed revision to that rule won't work! Deny From env= only tests for the existence of an env var, so setting it back to 0 if the referrer is you just bans yourself.
Back to the drawing board...

37 Jun 24, 2005 02:01

Yes, I have write access to all of them. I was thinking a csv array and the reason I didn't write it was because I didn;t want to write the parsing code but now that you mention it... a real array would work fine with a foreach... I should be able to hack that out myself...eeeks it's been a while since I did much programming. Don't know if I'll get to it before this weekend anyway though.

39 Jun 24, 2005 15:02

ralphy wrote:

I benchmarked the [url=http://weblog.sinteur.com/index.php?p=7967]New anti-spam trick[/url] about using the [url=http://dsbl.org/]Distributed Sender Blackhole List[/url] services.

The BlockUntrustedVisitors function (see below) takes approximatively 0,86 to 1,08 ms per call with an average of about 1 ms on my server (that speed may vary a lot depending of your server ping and bandwith):



function BlockUntrustedVisitors()
{
    $VisitorIP = $_SERVER[ 'REMOTE_ADDR' ];
    list( $a, $b, $c, $d ) = explode( ".", $VisitorIP );
    if( gethostbyname( "$d.$c.$b.$a.list.dsbl.org" ) != "$d.$c.$b.$a.list.dsbl.org" )
    {
        // Not trusted
        header( "Location: http://dsbl.org/listing?".$VisitorIP );
        die();
    }
}

It's nothing compared to 737 to 1,809 ms per displayed b2evolution page on my site! So, it appears to be quick enough to call it before every page displayed.

There is an interesting [url=http://www.declude.com/Articles.asp?ID=97]List of All Known DNS-based Spam Databases[/url]. Using several services might reduce spamming, but then, a "cache" system would be welcome, since pinging every database before displaying a web page might last a while.

For now, I use [url=http://dsbl.org/]dsbl.org[/url], [url=http://opm.blitzed.org/]blitzed.org[/url] and [url=http://bsb.empty.us/]bsb.empty.us[/url] (that last one for both IP and referrer domain) before displaying any page of my blogs. It appears to be quick enough for me.

40 Jun 24, 2005 15:20

isaac wrote:

Checking the referring page for a link to your site doesn't always work, since you might have links to your site from mail messages or other hidden things, and also that means that you're delaying loading your page until some other page loads into the get_file_contents call, thus doubling your load time, and triggering a back-button frenzy.

You are right to notice some pages can refer a site without being accessible from that site. E-mails are a good example. However, most referring pages are accessible. Since January 1st, my site had 23,421 connexions from an external page (excluding search engines) and 43 connexions from e-mails. If a referrer web page cannot be accessed, it is suspicious.

Loading referring pages on a cron table basis would help identifying potential referring spam. The next time an administrator user connects, he/she can check the suepected spam referrers. A white list would help to prevent disturbing the administrator with authorized sites.

41 Jun 25, 2005 11:59

isaac wrote:

Ben,

It'll probably be something like the scripts that I use to update and recheck my blacklist. http://isaacschlueter.com/admin/b2antispam_recheck.php and http://isaacschlueter.com/admin/b2antispam_poll.php

I basically just trim out everything from the antispam page except for the bare essentials. I took out the login requirement, too - since i'm cronning it for every 6 hours, who cares if some random person makes my page update the blacklist a little more often?

Whoa that is excellent. Hope you don't mind if I use them? I have been looking for a way to use cron to update the blacklist. :D

43 Jul 05, 2005 14:47

With blacklists becoming huge, it becomes interesting to reduce the size of tested items. So, I've developed a (very ugly, unoptimized and crappy) PHP script intended to generate an optimized filter to be used to block refspammers.

The hack takes two lists as parameters: a blacklist and a whitelist and outputs an "optimized" list of regular expressions that catch most/all the blacklisted items and catching no/few whitelisted ones.

The principle is simple: for every blacklisted item, the script creates every possible regular expression of the form:

part1.*part2.*part3


The minimum and maximum lengths as well as the number of parts each item is splitted is configurable.

A "scoring" system evaluates each regular expression (better score for regex catching most blacklisted items, worse score for regex catching whitelisted items) and chooses the best one. Once choosen the best regex, the script removes the catched items and reruns to remove the blacklisted items left until every blacklisted item is removed.

Even if the script works pretty fine on smaller lists, it appears it can take up to tens of hours (if not days) to generate such an optimized blacklist using real-life black- and whitelists.

I'm going to try to port it in C++ or C# to make it more efficient. Some algorithm optimizations should also help a lot...

44 Jul 07, 2005 06:32

I saw the anti-spam solution on drupal, they use trainable Bayesian filter.
http://drupal.org/project/spam

I take a look how they implement, then I think there must be a mechanism like that on b2evo. I would like to write something more useful than now I use on b2evo. Below is the pseudocode I wrote, I'll appreciate for any thoughts or commnets about this.


define ('UNPUBLISH_THRESHOLD', '50');
define ('DELETE_THRESHOLD', '85');
/* two number to determin action
 *
 *if rating > UNPUBLISH_THRESHOLD => mark as unpublish, display to user manually define
 *if rating < UNPUBLISH_THRESHOLD => just wait and see, publish comment (publish_threshold)
 *if rating > DELETE_THRESHOLD => delete, and add into .htaccess, then report to b2evo blacklist
 *if rating < 100-DELETE_THRESHOLD => really not spam, cron job to auto purge database
 *
 */

function spam_main($url, $content, $filter_type){
//===== parsing something we need
  // transform http://my-domain.name.org.tw/directory/sample.php
  // to my-domain.name
  $domain_name = parse_url_to_domain($url);
  
  // only get between <a></a> and http://like.domain.name
  if( $content != NULL ) // NULL is comment/trackback, else is referer
     $link_in_text[] = parse_content_to_only_link($content);  
  }

  // tokenize my-domain.name
  // to my, domain, name
  $token[] = parse_domain_name_to_token($domain_name);
  while( $link_in_text[] is not NULL ){
    $token[] += parse_link_to_only_domain_name( $link_in_text[] );
  }

//===== calculate probability of this comment/trackback/referer
  $weight =0;
  if( $domain_name is digit ){
    $weight += spam_open_relay_check($domain_name);
  }
  if( $filter_type is user_define_yes or user_define no spam ){
    $weight += user_training_define_spam( $domain_name, $token[] );
    $rating = insert_update_db( $domain_name, $token[], $weight);

    // user define is 100% spam or 100% not, we won't run any other test
    $action = determine_action( $rating );
    return execute_filter($action);
  }
  else{
    // automatic spam have some characteristics
    //   1. long domain name
    //   2. high click rating in the same time interval
    //   3. replicate part of domain name (sex.123-abc.com, sex.jeffeji.com)
    // so we can use poll system to predict 
    $weight += evaluate_domain_name($domain_name);
    $weight += evaluate_domain_name_token($token[]);
  }

  $rating = insert_update_db( $domain_name, $token[], $weight);
  $action = determine_action( $rating );
  return execute_filter( $action );
}


I also think something about db table and field


table spam_url
---commentID(null for referer)
---rating
---link
---time
---interfal
table spam_token
---rating
---token

45 Jul 07, 2005 10:34

isaac wrote:

Ben,

It'll probably be something like the scripts that I use to update and recheck my blacklist. http://isaacschlueter.com/admin/b2antispam_recheck.php and http://isaacschlueter.com/admin/b2antispam_poll.php

I just wanted to say this is working like a dream. I will be watching until the end of the month to see the bottom line on if my bandwidth was saved but at this point, compared with last month, it seems to be working.

It is also saving me a lot of hours that I don't have to spend now updating the antispam. I love the 'recheck all' feature also.

Biggest thing I love though is going to my blog sites and not seeing oodles of referrer spam. *YAY* :D

Heidi

46 Jul 07, 2005 23:02

Re: Bayesian filtering:
The powers that be are discussing something like this for an upcoming version.

Also, I decided that I have too many things in my blacklist that are repeats of one another, so I wrote up a pruner.

Check this out: http://isaacschlueter.com/tests/b2antispam_prune.php.txt

Strip the .txt and save it in your admin folder. When hit, it prunes your blacklist to remove needlessly duplicated entries. It cut mine down the size of my .htaccess rules by about 40%, which is nice.

47 Jul 19, 2005 17:25

Can someone decode this thread so that I can figure out which hacks to implement and how?

I am desperately trying to lower server load caused by spam referers (my site has been suspended for this reason, if I can implement some of the hacks suggested here, perhaps they will host my site again).

I'm pretty sure other technically challenged B2evolution users would appreciate an outline of the hacks available and instructions on how to apply them.

Thanks

48 Jul 25, 2005 02:55

I'm definitely technically challenged. I need a little more guidance here before my site goes down again because my bandwidth is gone.

Isaac with that first code you put in hacks.php, once that is done (I had to create that file first) is there something to do to activate it or something?

With the longer code you posted...what file does that get added into and where? Is it the htaccess file? And I should remove any RewriteCond's that I already added in?

50 Jul 27, 2005 06:47

I've hit a problem. When I try to leave a comment on my second blog... the page comes up asking you to click to bypass yada yada... but when you click it returns to my main blog and so it continues in a vicious circle.

How do I fix this???

51 Jul 27, 2005 07:11

isaac wrote:

Check this out:
http://isaacschlueter.com/tests/b2antispam_genhtaccess.php.txt

Save it in your admin folder, strip the .php, and run it.
Unless you set the $do_write option to true, it won't actually write anything, but will just tell you what it WOULD have done. If it's what you want, you can set it to true.

Other than that, just follow the instructions.

Thoughts? Comments?

Strip the php? From the file name? From the code? Doesn't it have to be php to run?

I tried it with the file named b2antispam_genhtaccess.php and got this:

Warning:  chmod(): No such file or directory in /home/purpleme/public_html/blog/admin/b2antispam_genhtaccess.php on line 86




Warning:  fopen(/home/purpleme/public_html/blog/admin/../.htaccess): failed to open stream: No such file or directory in /home/purpleme/public_html/blog/admin/b2antispam_genhtaccess.php on line 88




Warning:  filesize(): Stat failed for /home/purpleme/public_html/blog/admin/../.htaccess (errno=2 - No such file or directory) in /home/purpleme/public_html/blog/admin/b2antispam_genhtaccess.php on line 89




Warning:  fread(): supplied argument is not a valid stream resource in /home/purpleme/public_html/blog/admin/b2antispam_genhtaccess.php on line 89




Warning:  fclose(): supplied argument is not a valid stream resource in /home/purpleme/public_html/blog/admin/b2antispam_genhtaccess.php on line 90







I had some kind of problem writing to the file.


Warning:  fclose(): supplied argument is not a valid stream resource in /home/purpleme/public_html/blog/admin/b2antispam_genhtaccess.php on line 207




Warning:  chmod(): No such file or directory in /home/purpleme/public_html/blog/admin/b2antispam_genhtaccess.php on line 208

I tried the build three and got errors also.

Am I supposed to edit the location of the htaccess file portion of the code?

I did CHMOD the file to 777.

I really need this dumbed down I guess. I can't get any of it to work.

52 Jul 27, 2005 14:42

To your first question. Check and see if you've added your own site to the antispam blacklist. Or there might have been a keyword come from the central list that is contained in your url.

53 Jul 27, 2005 16:59

First question:
You answered it yourself. That was a typo on my part, and should be "Strip the .txt from the filename." I.e., save it as b2antispam_genhtaccess.php. The .txt is there so that you can see the file instead of running it.

All the errors in the code are because you do not have a .htaccess file residing at the place where it's looking.

You should have a file called .htaccess residing in your blog installation folder. (That, is the folder just ABOVE your admin folder, usually either the root of your site or /blogs/.) .htaccess is just a regular old text file. The filename is just .htaccess. (Yes, it starts with a period, so it's really only got an extension. This tells Apache not to share the file with outsiders.)

// where's the .htaccess file that we should be editing?
// hint: you can set this to "test.txt" or something to check out what it does without committing.
$htaccess_file = dirname(__FILE__) . "/$admin_dirout/.htaccess";

This bit is trying to find the .htaccess file. You shouldn't have to edit this if you've set it up the way that I just said.

As always, and this should go without saying, test everything first! If you have a local machine that has Apache set up, then try it there first. If you don't, then you probably shouldn't be monkeying around with .htaccess files, because you can really easily turn your whole site into one big HTTP 500 error with a single misspelled directive. (Of course, you can always fix the problem by just removing the offending line, or the whole .htaccess file if you can't figure out which line is doing it.)

54 Jul 27, 2005 22:18

I don't have/know how to check on a local machine.
I've kept a back up of the original .htaccess file

Granted I don't know exactly what I'm doing but I have to do something to stop this crap. OR just close both my blogs and say fuck it. :-/

I'll check the location of the file and see if I need to move it.

55 Jul 27, 2005 22:25

Okay I moved the .htaccess file.

I've put the file back now and its still generating errors
http://foreverpurple.com/blog/admin/b2antispam_genhtaccess.php

The only thing I found in my black list was forever.kz and I unbanned it.

I'm able to post from work at my main blog www.foreverpurple.com/blog
but at the second www.foreverpurple.com/blob/crab_blog.php it still thinks I'm a spammer. I logged out to do this btw.

You are being denied access to this page because you have been referred here by a known or suspected spammer: [http://foreverpurple.com/blog/crab_blog.php].

56 Jul 28, 2005 00:22

Daethian,

FTP the existing .htaccess file to your local machine. Please copy and paste the entire contents into a PM to me.

You've also probably got a .htaccess file in the root of your system (since it redirects from http://foreverpurple.com to http://foreverpurple.com/blog.)

I'm curious as to what's going on here.

57 Jul 28, 2005 07:19

I put my original .htaccess files back in place. The same thing is happening, where it thinks that I'm a spammer. I took the hacks.php file off and now its just telling me the authimage code is wrong. I'm going to remove that too and try to start over.

58 Jul 28, 2005 07:36

I had edb's hack in place too.

So now I've removed everything I've done except renaming my htrserver file.

Everything is working again.

I'm going to retry yours again by itself and see what happens.

59 Aug 12, 2005 23:13

Just wanted to say that we'll integrate this pretty efficient hack into future releases.

(We won't block open mail relays though... those may also be proxies for legitimate users).

Also, the main reason we don't have regexps in the antispam list is that regexps take significantly longer to process. This is why querying mysql with LIKE *might* still be faster than having REGEXPs in the Apache conf files. (I wish someone would check that though... ;) )

60 Aug 12, 2005 23:23

Thanks for making this a sticky post, and I'm glad to hear the functionality will become part of the future of b2evolution. The first post all by itself cut my bandwidth consumption in half. Way cool! The pruner near the end cuts about 35% from the full list of banned keywords. Another cool deal, but the first bit (assuming up-to-date antispam lists) is an incredible benefit.

EVERYONE should be using it.

62 Aug 13, 2005 02:13

isaac wrote:

Re: Bayesian filtering:
The powers that be are discussing something like this for an upcoming version.

Also, I decided that I have too many things in my blacklist that are repeats of one another, so I wrote up a pruner.

Check this out: http://isaacschlueter.com/tests/b2antispam_prune.php.txt

Strip the .txt and save it in your admin folder. When hit, it prunes your blacklist to remove needlessly duplicated entries. It cut mine down the size of my .htaccess rules by about 40%, which is nice.

Another way to remove needless entries from the blacklist table would be to execute this single line SQL command:

DELETE evo_antispam FROM evo_antispam AS a, evo_antispam AS b WHERE a.aspm_ID<>b.aspm_ID AND a.aspm_string LIKE CONCAT('%',b.aspm_string,'%')


The above line may take a while (more than 30 seconds allowed to most PHP-based [url=http://www.mysql.com]MySQL[/url] clients like [url=http://www.phpmyadmin.net/home_page/]phpMyAdmin[/url]). Anyway, it shouldn't be slower than your original PHP+MySQL version.

63 Aug 13, 2005 02:45

I am getting a crazy error much like Daethian2 talked about. Here is the error I get when running b2antispam_genhtaccess.php

Warning: fopen(/home/keninman/public_html/blog/admin/../.htaccess): failed to open stream: Permission denied in /home/keninman/public_html/blog/admin/b2antispam_genhtaccess.php on line 208

Warning: fwrite(): supplied argument is not a valid stream resource in /home/keninman/public_html/blog/admin/b2antispam_genhtaccess.php on line 209

I had some kind of problem writing to the file.

Warning: fclose(): supplied argument is not a valid stream resource in /home/keninman/public_html/blog/admin/b2antispam_genhtaccess.php on line 213

It looks to me like the URL it is generating for my .htaccess file is begining with /home/keninman/ instead of just starting with /public_html and this is why I am getting the error. So I ask how does this file figure out where the .htaccess file is located and why is it adding the /home/keninman to the URL of the .htaccess file and how do I correct it? I have searched and I cannot figure it out.

64 Aug 17, 2005 23:02

Hi. I'm sorry if this should be in a different thread. I implemented the technique mentioned in the first post of this thread (the conf/hacks.php file), and it seems to be working, except that I, too, am getting error messages. When I try to post a comment or modify my _main.php, I get an error message like this:

Warning: Cannot modify header information - headers already sent by (output started at /home/samwood/public_html/conf/hacks.php:42) in /home/samwood/public_html/htsrv/comment_post.php on line 202

Warning: Cannot modify header information - headers already sent by (output started at /home/samwood/public_html/conf/hacks.php:42) in /home/samwood/public_html/htsrv/comment_post.php on line 203

Warning: Cannot modify header information - headers already sent by (output started at /home/samwood/public_html/conf/hacks.php:42) in /home/samwood/public_html/htsrv/comment_post.php on line 204

Warning: Cannot modify header information - headers already sent by (output started at /home/samwood/public_html/conf/hacks.php:42) in /home/samwood/public_html/htsrv/comment_post.php on line 205

Warning: Cannot modify header information - headers already sent by (output started at /home/samwood/public_html/conf/hacks.php:42) in /home/samwood/public_html/htsrv/comment_post.php on line 209

I've been fairly desperate to stop the bandwidth leakage, but I think my visitors are seeing this message when they post, too, which reults in them posting the same message all over again. Is there any way to fix this?

65 Aug 17, 2005 23:49

I don't know for sure, but I'd say look for whitespace in any and all of the files you've edited, focusing of course on those you were working with right before the error showed up. Whitespace refers to anything and everything after the closing "?>" including an empty next line. BTW sometimes it's the editor you use that makes bad things happen. I don't know which editors would be bad, but sometimes editors add stuff that (effectively) corrupts the file. Try opening your files in good old Notepad to make sure whatever you're using isn't part of the problem.

66 Aug 18, 2005 00:06

Getting rid of the whitespace fixed the problem. Thank you, EdB!

67 Oct 11, 2005 04:23

isaac wrote:

Check this out:
http://isaacschlueter.com/tests/b2antispam_genhtaccess.php.txt

Save it in your admin folder, strip the .txt off of the filename, and run it.
Unless you set the $do_write option to true, it won't actually write anything, but will just tell you what it WOULD have done. If it's what you want, you can set it to true.

Other than that, just follow the instructions.

Thoughts? Comments?

Isaac... what if I already have a full htaccess file, where I have my stubs defined, the pineappleproxy stuff, among other things... Will your genhtaccess append or overwrite?

jj.

68 Oct 11, 2005 12:35

I use this scipt via cron daily and it appends my .htaccess

It works great, for me anyway.

You can set it up to write to any file and then when you are happy

you can then point it over to your .htaccess.

Jon

69 Oct 17, 2005 17:51

I added the 403 redirector in the first post to my hacks.php 7 days ago... already have 20,300 403's in my stats since then, and my bandwidth usage has dropped nicely. Sweet!

jj.

70 Oct 25, 2005 21:47

After adding the script to my hacks.php file, as I mentioned in the post above, my bandwidth usage has dropped dramatically, but I was *occasionally* getting a few referal spams that got through, and I had to manually remove them or wait for the next automatic antispam update (using EdB's antispam hack)... No big deal, it was only one or two that got through per day...

So anyway, about a week ago I edited the script by uncommenting the line:


    // un-comment the next line of code to redirect back to the referring
    // page. I didn't do this, in the event that perhaps there is a false
    // positive, but you needn't be so kind.
    // In any event, the bandwidth is teensy either way.
    header('Location: ' . $_SERVER['HTTP_REFERER']);

which bounces the referal spammer back to his own site... Since then, I haven't had a single referal spam get through to my site. I don't know if this is just a coincidence or not, but it's definitely made things even easier now.

jj.


Form is loading...