Recent Topics

1 Aug 03, 2007 00:04    

My b2evolution Version: 1.10.x

Seems to be a new twist in the blog defacement or spamvertising world - or other...

I just updated to the latest B2 Version after getting a bunch of "nonsense" spam - you know the kind.

anyway with the new Florida and turing test plugin (thanks Ed) - I'm getting zero spam.

However looking in my raw logs I still am getting pummeled with requests like these

Host: lj512018.crawl.yahoo.net (many similar crawlers with just a numerical difference)

/blog/index.php?title=new_years_eve&more=1&c=1&tb=1&pb=1

the &more=1&c=1&tb=1&pb=1 - am I wrong or is that part engineered to bring the comments form up?

the crawler user agent identifies as Yahoo/slurp but theres no way that 50 hits per 10 minutes every day is from Yahoo.

Is anyone familiar with Yahoos official crawler ? Is/are these crawler(s) bogus?

And lastly does anyone know the source of these ? I'm wondering myself as they don't consume alot of bandwith per request - BUT they do put a strain on server resources and that might be a problem for some.

I'm amost thinking someone engineered a worm or botnet created to post nonsense spam solely for the resource drain.

Any comments?

2 Aug 03, 2007 00:14

According to [url=http://www.ysearchblog.com/archives/000460.html]Yahoo Search Blog[/url] it *is* genuine Yahoo. I think the comment pages are just the links he finds on your blog.
They do a lot of searching on your site I must admit. If it is genuine it obeys robots.txt and you could filter the paths you don't want it to visit.

Perhaps some other visitor of this forum can shed some more light on this topic.

Good luck

3 Aug 03, 2007 00:25

Holy codfish - you should see how badly I'm getting beaten up by legal yahoo bots! - well, legal assuming they come back from reverse dns ok.

and thanks - I've been trying to find Yahoos official line on this for a few days but I guess that all boils down the the better search phrase.

I'm off to check the crawlers -

On filtering - well, I wouldn't want to filter valid blog posts - and I've already done their mod on how to keep the frequency down -

User-agent: Slurp
Crawl-delay: 5

that did nothing.

If a crawler consumes more than one gig of bandwidth per month and isn't Google - is it really a good investment to allow it?

4 Aug 03, 2007 00:32

CapnRob wrote:

but I guess that all boils down the the better search phrase.

crawl.yahoo.net

CapnRob wrote:

User-agent: Slurp
Crawl-delay: 5

that did nothing.

I'm not into robots.txt. Perhaps someone else is. Is Slurp the correct identifier?

Good luck

5 Aug 03, 2007 00:56

You know - its worse than I thought - and yes Yahoos user agent in robots text is identified as "slurp"

theres a whole lot of info on Yahoos over indexing here
http://www.jackhumphrey.com/fridaytrafficreport/search-engine-optimization/yahoo-slurp-has-been-banned-from-ftr/

and yes, all the crawlers I reverse dns'ed identified as valid Yahoo spiders.
(thanks Afwas)

Yahoo crawlers took up 1.5 gigs of my bandwidth for July - - On the other side of the coin, Google took about 1.2 gigs

Referrals for Google were 80% higher than from Yahoo

Google hits late at night and all at once - whereas Yahoo hits me one article at a time all day/night long.

I'm actually going to disallow Yahoo's crawlers altogether as I think this behavior is abusive. I'd probably advise everyone to take a peek at their raw logs and see how Yahoo is treating you.

6 Aug 03, 2007 02:02

This is very interesting stuff that will certainly be of use for the blogs that generate a lot of trafic of their own.

As I am not opposed to ad programs and certainly not to generating high rankings to promote one's blog, I think an article about traffic should do good in the docs.

Of course we need a little more investigation, as only one article and one oppinion isn't enough to advice banning Yahoo all together, but this looks like a serious case.

Thanks

7 Aug 03, 2007 02:13

Heck Id love the indexing - if it weren't so rude - I don't advise anyone doing what I do in any case

- but I do think IF Yahoo's crawlers are now being Unruly -then maybe the community ought to know.

Its very likely that my situation is unique -

at any rate I also am curious if anyones logs show this behavior.

Alright - thats enough thinking for one day - I'm off to the pub.

B)


Form is loading...