1 capnrob Aug 03, 2007 00:04
3 capnrob Aug 03, 2007 00:25
Holy codfish - you should see how badly I'm getting beaten up by legal yahoo bots! - well, legal assuming they come back from reverse dns ok.
and thanks - I've been trying to find Yahoos official line on this for a few days but I guess that all boils down the the better search phrase.
I'm off to check the crawlers -
On filtering - well, I wouldn't want to filter valid blog posts - and I've already done their mod on how to keep the frequency down -
User-agent: Slurp
Crawl-delay: 5
that did nothing.
If a crawler consumes more than one gig of bandwidth per month and isn't Google - is it really a good investment to allow it?
4 afwas Aug 03, 2007 00:32
CapnRob wrote:
but I guess that all boils down the the better search phrase.
crawl.yahoo.net
CapnRob wrote:
User-agent: Slurp
Crawl-delay: 5that did nothing.
I'm not into robots.txt. Perhaps someone else is. Is Slurp the correct identifier?
Good luck
5 capnrob Aug 03, 2007 00:56
You know - its worse than I thought - and yes Yahoos user agent in robots text is identified as "slurp"
theres a whole lot of info on Yahoos over indexing here
http://www.jackhumphrey.com/fridaytrafficreport/search-engine-optimization/yahoo-slurp-has-been-banned-from-ftr/
and yes, all the crawlers I reverse dns'ed identified as valid Yahoo spiders.
(thanks Afwas)
Yahoo crawlers took up 1.5 gigs of my bandwidth for July - - On the other side of the coin, Google took about 1.2 gigs
Referrals for Google were 80% higher than from Yahoo
Google hits late at night and all at once - whereas Yahoo hits me one article at a time all day/night long.
I'm actually going to disallow Yahoo's crawlers altogether as I think this behavior is abusive. I'd probably advise everyone to take a peek at their raw logs and see how Yahoo is treating you.
6 afwas Aug 03, 2007 02:02
This is very interesting stuff that will certainly be of use for the blogs that generate a lot of trafic of their own.
As I am not opposed to ad programs and certainly not to generating high rankings to promote one's blog, I think an article about traffic should do good in the docs.
Of course we need a little more investigation, as only one article and one oppinion isn't enough to advice banning Yahoo all together, but this looks like a serious case.
Thanks
7 capnrob Aug 03, 2007 02:13
Heck Id love the indexing - if it weren't so rude - I don't advise anyone doing what I do in any case
- but I do think IF Yahoo's crawlers are now being Unruly -then maybe the community ought to know.
Its very likely that my situation is unique -
at any rate I also am curious if anyones logs show this behavior.
Alright - thats enough thinking for one day - I'm off to the pub.
B)
According to [url=http://www.ysearchblog.com/archives/000460.html]Yahoo Search Blog[/url] it *is* genuine Yahoo. I think the comment pages are just the links he finds on your blog.
They do a lot of searching on your site I must admit. If it is genuine it obeys robots.txt and you could filter the paths you don't want it to visit.
Perhaps some other visitor of this forum can shed some more light on this topic.
Good luck