Recent Topics

1 Sep 30, 2005 14:47    

As I've just been the victim of the Gigabot trawling every link on my site several times, I've noticed a problem with the b2evo calendar - it contains links to future months, then that page contains a link to the next future and so on until PHP's definition of "the end of time"

This resulted in the crawler absolutely hammering my site, and pulling down nothing but empty archives of posts from 50 years into the future...

Can the calendar be modified to remove these "future archives" - I've done a search on here and not found any other references to this problem, but it may account for many peoples hgih bandwidth usage.

2 Sep 30, 2005 16:37

Try the hack described here:

    [url=http://forums.b2evolution.net/viewtopic.php?t=4459]Search Engines Optimization (SEO)[/url][/list:u]It won't prevent search engines robots to crawl your calender until the fifthieth millenium, however, but it will prevent your useless blog's pages to be indexed. Usage of a robots.txt file at your root web directory should help you to reduce useless crawling, since a year can be considered as a folder. THat should help you a bit (this robots.txt file is based on mine):
    User-agent: *
    Disallow: /yourstubfileoranything.php/2006
    Disallow: /xmlsrv/*/rdf.*$
    Disallow: /xmlsrv/*/rss.$
    Disallow: /xmlsrv/*/atom.$
    Disallow: /*.avi$
    Disallow: /*.mpg$
    Disallow: /*.mpeg$
    Disallow: /*.mov$
    Disallow: /*.rm$
    
    User-Agent: OmniExplorer
    User-Agent: larbin
    User-Agent: Yandex
    User-Agent: Grub.org
    User-Agent: Katalog
    User-Agent: BaiDuSpider
    Disallow: *
    
    User-agent: *
    Crawl-delay: 120

3 Sep 30, 2005 16:46

Thanks for that - I'd already implmented a robots.txt, and have now blocked the specific IP's that gigabot uses in the htaccess file - that has at least allowed the server to remain online...

For now I've just removed the calendar from the template - and will probably take a look at the code myself over the weekend to see if I can stop it showing the "next month" link into the future. If I succeed I'll post here...

4 Sep 30, 2005 16:52

tom_arush wrote:

For now I've just removed the calendar from the template - and will probably take a look at the code myself over the weekend to see if I can stop it showing the "next month" link into the future. If I succeed I'll post here...

That might solve your problem in a relatively easy way: why not adding a rel="nofollow" attribute to your calendar's a tags?


Form is loading...