Recent Topics

1 Jul 15, 2008 19:44    

My b2evolution Version: 1.10.x

Under my stats section I have 0 robots indexing the blog site. Granted, I only have one entry but I'm about to start blogging again soon and am wondering what could be causing this?

http://marunchak.co.uk/blog/

Google last indexed www.marunchak.co.uk in June yet nothing seems to be happening in my /blog directory.

Using 1.10

What gives?

3 Jul 15, 2008 20:58

If it's as simple as that I should be ok then. ;p

I'm not sure how the indexing works but it's as though it's ignored the blog directory completely.

Edit: Hell, it even picked up a flash animation I forgot I ever made.
http://marunchak.co.uk/Steorn/Demo.swf

4 Jul 16, 2008 09:59

I just realised that I haven't been generating any static pages. Since the bots can't go into the database to retrieve content (thank god), does it make sense to assume that, without static pages, the bots wouldn't find anything?

5 Jul 16, 2008 10:06

No

I don't use static pages on my site and the evil googy bot has no problem indexing me.

Might be worth checking that you haven't banned them in your robots.txt

¥

*edit*
Just checked your home page, having it "refresh" to your blog url is probably what's killing your ability to be indexed.

Consider either moving your blog up a folder ( so it's on your home url ) or using htaccess to do a 301(2?) redirect to the correct url.

Personally I'd go with moving your blog ;)

6 Jul 16, 2008 11:32

Well, the refresh is something I only added yesterday. :[

I'm in the process of moving stuff about. Thanks for your answers, just one more question, what's the disadvantage of editing the htaccess?

Until I get a new website setup, the blog is the only thing I want people to see.

In any case, here's my _stats.php

<?php

/**

 * This is b2evolution's stats config file.

 *

 * @deprecated TODO: It holds now just things that should be move around due to hitlog refactoring.

 *

 * This file sets how b2evolution will log hits and stats

 * Last significant changes to this file: version 1.6

 *

 * @package conf

 */

if( !defined('EVO_CONFIG_LOADED') ) die( 'Please, do not access this page directly.' );





/**

 * Self referers that should not be considered as "real" referers in stats.

 * This should typically include this site and maybe other subdomains of this site.

 *

 * The following substrings will be looked up in the referer http header

 * in order to identify referers to hide in the logs

 *

 * WARNING: you should *NOT* use a slash at the end of simple domain names, as

 * older Netscape browsers will not send these. For example you should list

 * http://www.example.com instead of http://www.example.com/ .

 *

 * @todo move to admin interface (T_basedomains), but use for upgrading

 *

 * TODO: handle multiple blog roots.

 *

 * @global array

 */

$self_referer_list = array(

	'://'.$basehost,			// This line will match all pages from your the host of your $baseurl

	'://www.'.$basehost,	// This line will also match www.you_base_host in case you have no www. on your basehost

	'http://localhost',

	'http://127.0.0.1',

);





/**

 * Blacklist: referrers that should not be considered as "real" referers in stats.

 * This should typically include stat services, online email services, online aggregators, etc.

 *

 * The following substrings will be looked up in the referer http header

 * in order to identify referers to hide in the logs

 *

 * THIS IS NOT FOR SPAM! Use the Antispam features in the admin section to control spam!

 *

 * WARNING: you should *NOT* use a slash at the end of simple domain names, as

 * older Netscape browsers will not send these. For example you should list

 * http://www.example.com instead of http://www.example.com/ .

 *

 * @todo move to admin interface (T_basedomains), but use for upgrading

 *

 * @global array

 */

$blackList = array(

	// webmails

	'.mail.yahoo.com/',

	// stat services

	'sitemeter.com/',

	// aggregators

	'bloglines.com/',

	// caches

	'/search?q=cache:',		// Google cache

	// redirectors

	'googlealert.com/',

	// add your own...

);







/**

 * Search engines for statistics

 *

 * The following substrings will be looked up in the referer http header

 * in order to identify search engines

 *

 * @todo move to admin interface, include query params

 *

 * @global array $search_engines

 */

$search_engines = array(

	'google.',

	'.hotbot.',

	'.altavista.',

	'.excite.',

	'.voila.fr/',

	'http://search',

	'://suche.',

	'search.',

	'search2.',

	'http://recherche',

	'recherche.',

	'recherches.',

	'vachercher.',

	'feedster.com/',

	'alltheweb.com/',

	'daypop.com/',

	'feedster.com/',

	'technorati.com/',

	'weblogs.com/',

	'exalead.com/',

	'killou.com/',

	'buscador.terra.es',

	'web.toile.com',

	'metacrawler.com/',

	'.mamma.com/',

	'.dogpile.com/',

	'search1-1.free.fr',

	'search1-2.free.fr',

	'overture.com',

	'startium.com',

	'2020search.com',

	'bestsearchonearth.info',

	'mysearch.com',

	'popdex.com',

	'64.233.167.104',

	'seek.3721.com',

	'http://netscape.',

	'http://www.netscape.',

	'/searchresults/',

	'/websearch?',

	'http://results.',

	'baidu.com/',

	'reacteur.com/',

	'http://www.lmi.fr/',

	'kartoo.com/',

	'icq.com/search',

);





/**

 * UserAgent identifiers for logging/statistics

 *

 * The following substrings will be looked up in the user_agent http header

 *

 * @todo move to admin interface (T_useragents)

 *

 * 'type' aggregator currently gets only used to "translate" user agent strings.

 * An aggregator hit gets detected by accessing the feed.

 *

 * @global array $user_agents

 */

$user_agents = array(

	// Robots:

	array('robot', 'Googlebot/', 'Google (Googlebot)' ),

	array('robot', 'Slurp/', 'Inktomi (Slurp)' ),

	array('robot', 'Yahoo! Slurp;', 'Yahoo (Slurp)' ),

	array('robot', 'msnbot/', 'MSN Search (msnbot)' ),

	array('robot', 'Frontier/',	'Userland (Frontier)' ),

	array('robot', 'ping.blo.gs/', 'blo.gs' ),

	array('robot', 'organica/',	'Organica' ),

	array('robot', 'Blogosphere/', 'Blogosphere' ),

	array('robot', 'blogging ecosystem crawler',	'Blogging ecosystem'),

	array('robot', 'FAST-WebCrawler/', 'Fast' ),			// http://fast.no/support/crawler.asp

	array('robot', 'timboBot/', 'Breaking Blogs (timboBot)' ),

	array('robot', 'NITLE Blog Spider/', 'NITLE' ),

	array('robot', 'The World as a Blog ', 'The World as a Blog' ),

	array('robot', 'daypopbot/ ', 'DayPop' ),

	array('robot', 'Bitacle bot/', 'Bitacle' ),

	array('robot', 'Sphere Scout', 'Sphere Scout' ),

	array('robot', 'Gigabot/', 'Gigablast (Gigabot)' ),

	// Unknown robots:

	array('robot', 'psycheclone', 'Psycheclone' ),

	// Aggregators:

	array('aggregator', 'AppleSyndication/', 'Safari RSS (AppleSyndication)' ),

	array('aggregator', 'Feedreader', 'Feedreader' ),

	array('aggregator', 'Syndirella/',	'Syndirella' ),

	array('aggregator', 'rssSearch Harvester/', 'rssSearch Harvester' ),

	array('aggregator', 'Newz Crawler',	'Newz Crawler' ),

	array('aggregator', 'MagpieRSS/', 'Magpie RSS' ),

	array('aggregator', 'CoologFeedSpider', 'CoologFeedSpider' ),

	array('aggregator', 'Pompos/', 'Pompos' ),

	array('aggregator', 'SharpReader/',	'SharpReader'),

	array('aggregator', 'Straw ',	'Straw'),

);





?>

I didn't actually set b2evolution up myself, it was installed for me by my host, I might have to get on to them.

7 Jul 16, 2008 12:02

There is no real disadvantage to editing your htaccess and it would allow googy bot to find your blog.

Alternatively add a link to your blog on your homepage so that the bots can follow that.

¥

8 Jul 16, 2008 20:48

I added it to google manually using that form they have, I got 243 bot hits in the next 24 hours. Also, I set up google ad sense and that seems to be providing context-based ads now. I guess I did something right. :p


Form is loading...