1 circlecast Sep 08, 2005 07:22
3 edb Sep 08, 2005 17:37
Thanks. I sort of figured there were more aggregators out there than the few listed. I'm not sure which forum it belongs in (probably not this one), but since I've always been on top of logging robots as robots to not muck up my "heavy hitters" section I can post a very full list of robots to add to that array. BTW since I don't care what they really are I often fill the text side with "something new on DATE". All I care is that robots get logged as robots. Anyway I'll post my list somewhere, and maybe with some helpful SQL strings for those who want to make their hitlog table accurate.
I guess I should be wondering if some of the things I'm calling robots were actually aggregators. Probably I would have noticed if a large number of "direct" hits were logged against xml-type pages? Hmmm... Since that's the only ones I would have corrected I probably have lots of hits against aggregators that aren't counted in the "by aggregator" section.
AARGH! Too much thinking!!!
I have recently updated my $user_agents array in conf/_stats.php based on my blog's hit log:
However, it appears the backoffice user agent summary groups the robots and aggregators based on the full user agent string and not the short version of it. After updating your $user_agents array, you are going to see several entries of the Yahoo Feed Seeker and so, each version having its own user agent string...
It might be possible to group all the user agents using their short user agent version as well as updating that list automatically. However, that implies the user agent version becomes a database table of its own. That also implies to refactor the whole hit log management, since that table grows a lot when you have a popular blog. My blogs' hitlog table is 13 MB big and it keeps only the last 7 days of visits!
It would be very interesting to split that hitlog table into several smaller ones:
a hitlog table (referencing the following tables to make it shorter, avoiding redundant information);
a referrer table (referencing a new base domain referrer table);
a user agent table (referencing a new short description user agent table).[/list:u]In addition to that, the hit_remote_addr entry of the hitlog table might become a 32-bit or 128-bit numeric entry making it shorter and quicker to manipulate.