Recent Topics

1 Feb 13, 2006 15:42    

just recently, the "refering searches" tab of my blog has been overrun with google hits, showing "not a query - no params". the pattern suggests that, where google's spiders used to hit the blog in such a way that the engine would log them as "direct accesses", it now logs them as search result hits.

does anyone know if google has in fact changed the way its spiders work, or has something slipped, in the way the b2 engine captures and archives its hits?

or is it possible that the hit sources are being spoofed, and i'm just being nailed?

2 Feb 13, 2006 16:04

define over-run?

I occassionally see google as a referer, absent the query.

3 Feb 13, 2006 17:38

what i mean by "over run" is that it is no longer occasional. i've been receiving a long string of hits, pushing all others off the normal stats view, that are logged in the "referring searches" tab. it's a new behavior.

i looked at the raw logs, and found entries like this:

212.28.146.64.transedge.com - - [13/Feb/2006:08:42:10 -0500] "GET **my file url** HTTP/1.1" 200 24917 "http://www.google.com/" "Mozilla/4.5 [en]C-CCK-MCD {C-UDP;HullUniASC} (Win95; I)"

so the reference gets logged as google, but the hit is actually coming from somewhere else.

4 Feb 19, 2006 06:42

[not a query - no params!] keeps popping up in my "Refering Searches" tab... Is there a fix for this one yet?

I did read other posts, and went ahead and added
'a9.com',
'www.google.',

to my conf/_stats.php file.

any other ideas?

5 Feb 19, 2006 21:41

I noticed that these entries seem to come from:

"POST /blog/htsrv/comment_post.php HTTP/1.1"

based on my web logs

I wonder how this is happening...

This is probably a new way to spam or try to spam the comments.

6 Feb 21, 2006 09:58

I have the same problem -and by 'overrun', I would say that all but two of the entries on my search stats are of this type this morning, and last night every single one was. This isn't an occasional problem any more. Any clues, anyone? I'm stumped.

7 Feb 21, 2006 12:31

lyrra sark wrote:

212.28.146.64.transedge.com - - [13/Feb/2006:08:42:10 -0500] "GET **my file url** HTTP/1.1" 200 24917 "http://www.google.com/" "Mozilla/4.5 [en]C-CCK-MCD {C-UDP;HullUniASC} (Win95; I)"

...so the reference gets logged as google, but the hit is actually coming from somewhere else.

I'm either confused by what youre saying above, or you simply dont understand how to read Apache's logs.

what you are looking at above, in a nutshell, is a hit from the host "212.28.146.64.transedge.com". http://www.transedge.com/ : (they are broadband providers, an isp), and they found your site via Google, at such and such time ...

There's absolutely nothing that looks odd or malicious in that log snippit, other than the fact that the query string isnt there. Is that something I see? Yes.

Is it something I would concern myself with? No.

or is it possible that the hit sources are being spoofed, and i'm just being nailed?

For what purpose?

8 Feb 21, 2006 12:33

esanchez wrote:

I noticed that these entries seem to come from:

"POST /blog/htsrv/comment_post.php HTTP/1.1"

based on my web logs

I wonder how this is happening...

This is probably a new way to spam or try to spam the comments.

You noticed what entries? Paste a snippit from your RAW APACHE LOGS please. Not your cpanel user-friendly logs.

--

Ya'll are wayyyy too paranoid and if I say that, that says a lot.

You do realize that websites are like "virtual things"?

9 Feb 21, 2006 19:44

Security: I agree with whoo. I don't see this as a security issue on the face of it.

Priority: this isn't the biggest deal in the world - site users are unaffected. But hey, it's kinda fun to surf the searchlogs, isn't it? It certainly helps in building sites when you know what the punters want.

Functionality: this is the issue. My search logs are more or less useless now.

I notice that only some installations seem to be affected by this. Or is it that nobody else cares about their searchlogs? If I'm in a small minority here I'll just put up with it - it's not that big a deal.

10 Feb 22, 2006 00:15

I noticed that the vast majority of google-driven traffic suddenly started showing up with the "not a query" text (or whatever it says). Looking was prompted by this thread. Dunno what it means but I think it's something weird, though not malicious.

You can fix this by banning google from your robots.txt file. It'll take a while of course, and it assumes google respects the robots file, but it'll stop google traffic. Google is evil.

11 Feb 22, 2006 02:12

im interested...

are hits from other search engines still showing the query string? Has anyone considered that perhaps the stat function is dropping the string?

I realize this flys in the face of my saying that Ive seen the same thing. But I dont think Ive seen it on my own blog to the degree you'll are seeing it. Makes me wonder....

I did google this phenom today, and late last year google's blog search was not sending a query string, however I tested it (blog search) today and it sent one fine to my site.

12 Feb 22, 2006 06:25

whoo wrote:

are hits from other search engines still showing the query string? Has anyone considered that perhaps the stat function is dropping the string?

Strike that thought. As soon as I walked away from the computer, I realized that the raw log snippit above provides that answer.

13 Feb 25, 2006 01:37

These are two lines from my apache log:

217.96.105.6 - - [24/Feb/2006:11:50:08 -0800] "GET /blog/index.php/WatchList/2005/12/20/the_christian_childrens_fund HTTP/1.0" 200 32533 "http://www.google.com/" "Mozilla/4.7 [en] (X11; I; HP-UX B.10.20 9000/782)"

217.96.105.6 - - [24/Feb/2006:11:52:00 -0800] "POST /blog/htsrv/comment_post.php HTTP/1.0" 200 704 "http://www.thechristianalert.org/blog/index.php/WatchList/2005/12/20/the_christian_childrens_fund" "Mozilla/4.7 [en] (X11; I; HP-UX B.10.20 9000/782)"

I've noticed that there are always two lines - one for the post. and one for the "comments"...

very weird...

14 Feb 27, 2006 16:16

actually, i think i do know how to read my raw logs.

i notice that when i use google myself, submitting a query that i can expect will bring up a reference to my site, and then use the link to go to my site, the log captures the query string the way it always did before, and it shows up normally in my "referring searches" tab.

when examining the immense raft of these "blank" searches, however, i am left with the impression, either that google has a new way of scraping sites that is logged differently than it used to be, or that there's some new technique for hit-spamming our sites. i notice that many, many of these "false" google hits are followed by comment attempts... as per esanchez's note.

and yes... in terms of security, it's not such a big issue, but in terms of rendering the "referring searches" view effectively useless, it's pretty annoying.

i'm going to modify either the logging routine or a config file, to stop logging these hits altogether. but that won't block the comment attempts.

15 Mar 07, 2006 17:58

ok. so, if someone with a better grasp of the particulars of this code could lend a hand, that would be appreciated.

what i want to do, is add another test to the _functions_hitlogs core routines, that will 1) check if the referring URL is supposedly google, 2) see if the referral has an empty query string, 3) if both are true, DO NOT log the hit.

one line of code, could save a lot of aggravation.

16 Mar 11, 2006 20:14

Interesting theory about the spamming, Lyrra. For what it's worth I've noticed that these hits only ever seem to come from google.com, not any of the localised google sites, which all seem to come through normally. Which might indirectly support the hypothesis that some new spamming tool is generating them, rather than some new trick of Google's. I wonder if any other blogging system is experiencing the same thing?

17 Mar 11, 2006 20:31

I'm finding an occasional 'regular' google result now, and the not-dot-com versions are showing regular results. The vast majority from plain old google though are the "no search string" type. While randomly surfing somewhere I came across a message board that was talking about changes that google is doing, so maybe this issue is actually related to those changes?

I got like a bazillion hits in 10 seconds the other day to 2 different posts. Half were refered by b2evolution's main page and the other half by b2evolution's skins site. I thought that was mighty odd and decided whatever was up it probably had a single IP behind it, so I pretended to ban b2evolution.net and found my offending IP. I then did a whois on the IP and you'll never guess who it was: google!

I banned the IP, but again I think change is happening at google and the world just has to deal with it. I think the evil forces that power google are finally taking over. Remember: google backwards is elgoog. And elgoog backwards? google! Just a coincidence? I think not!!!

18 Mar 26, 2006 20:17

I am getting loads of bogus searches constantly and not just from google:

03/26/06 12:49:23 pm search.yahoo.com àíàëèçàòîð ÷èñòîòû ãàçîâ
03/26/06 12:32:11 pm search.yahoo.com ¹Œ•“`à
03/26/06 12:32:10 pm search.yahoo.com томска из космоса
03/26/06 12:22:51 pm search.netscape.com nitrogen dioxide, UV spectrum
03/26/06 12:22:50 pm search.yahoo.com ‚W‚W¯À
03/26/06 12:22:48 pm search.msn.com бакшт Е.Х.
03/26/06 10:30:29 am directory.google.com [not a query - no params!]
03/26/06 10:29:12 am dir.ualist.com [no query string found]
03/26/06 10:29:11 am search.yahoo.com Zavoruev
03/26/06 10:29:01 am search.yahoo.com оптика
03/26/06 10:29:00 am search.netscape.com "Sun's culmination"
03/26/06 10:28:59 am search.yahoo.com “Œ‹ž‰w\“à

This despite removing search results from public stats pages. Right now what I want to figure out how to do is remove the ?disp=stats from every page. Just removing the link didn't work as the page is still and can be still accessed.

Or I want to block anyone trying to access that. If I could just remove whatever code allows a page url to be appended with ?disp=stats that would work too.

Heidi

19 Mar 26, 2006 21:04

The disp=stats parameters is no longer available if you upgrade to the latest versiono f b2Evolution.

Sample Output:
410 Gone
b2evolution does no longer publish referer statistics publicly in order not to attract spam robots

Hope this helps...

Edgar

20 Mar 26, 2006 21:08

It does and it doesn't. I don't have time to upgrade a few due to excessive customization. In the meantime I would like to combat it somehow.

I do have the update installed on 1 domain but there are some things I don't like about the stats removal and some I do.

21 Apr 10, 2006 18:45

I'd kinda got bored with all this, and stopped looking at my stats search log. Today I went back, and what do you know? All the blank entries are gone, it's all back to normal. I hope it's the same for everyone else. I'm going to close the file and put this one in the same category as crop circles, men's nipples and George W. Bush. Over and out.

22 Apr 10, 2006 19:23

I found a way to combat the bogus search results but to do that I had to copy the search url and strip most everything out but the search query. it has worked well so far. This is just one of what I added to my antispam:

%82W%82W%90%AF%8D%C0

And if I decide it isn't working they show up at the top of my antispam list so deleting them would be easy. :)

Heidi


Form is loading...