1 village_idiot Oct 25, 2006 20:43
3 village_idiot Oct 26, 2006 04:21
I did.. and so a domain that is parked on godaddy reported me.. and one thats not even working :P
interesting ...
its fixed, thanks :)
4 village_idiot Oct 26, 2006 11:04
i thought about this a little more -- and doesnt something like this happening beg the question of some sort of central whitelist, that is NOT user-driven?
I guess Im peeved since I KNOW I do not spam, and therefore undeserving of being reported once, much less twice.
That or maybe the validity of reports ought to be somehow checked -- atleast something to make sure the fricken sites actually function??
Also, the deprecation? Does that insure it wont happen again?
5 edb Oct 26, 2006 12:19
*WARNING: long reply*
The central list never released your domain to b2evolution users.
Reporting makes a draft post in a multiuser blog, and that's all. Each person who reports the same character string (which is the title of the post btw) gets added to the list, and the date of the draft gets updated. There is no way to know what the date of the first report of your domain was - just that it happened. Now we have a date stamp for each reporter, so now we can tell if a spammer is being reported in dribs and drabs or as an avalanche. The former might not be a spammer, the latter probably is.
2 reports NEVER turns into a published post, which is how a keyword is added to the list. Asking for the updated list means telling central "this is the date and time of the last published post that I got, so tell me all the post titles published since then". Way WAY back it happened, but the list grows by (literally) hundreds of reporters daily. Not hundreds of posts/keywords, but reporters - who may or may not be reporting something for the first time. Anyway to get turned into a keyword takes at least 10 reporters in fairly short order, but generally I don't publish them until I see a bunch more reporters and a handful of subdomain variations come in.
So to get back to the point your domain never got delivered to b2evo users who asked for the latest keyword list. Oh and as near as I can tell everyone who plays a lot in the forums gets reported. Sometimes we really feel the love ya know?
The deprecation thing is how we, as antispam central admins, tell ourselves to not bother considering a post as something that might need to be published. Typically that's because it matches something already published. Like for example if we publish .foobar.tld there is no need to publish spam.foobar.tld no matter how heavily it gets reported. Something I started doing when the database was so big it was choking the server it lived on was to delete old drafts and old deprecates. Drafts older than 8 months are probably never going to be published, so they go byebye. Deprecates last 16 months, so deprecating your domain means antispam admins won't see it as a draft (potential keyword) for 16 months AFTER the last person reports it. Like draft posts, deprecated posts get a new date stamp with each new reporter.
A central whitelist would be possible I guess. Sorta like a stamp of approval, but the list would be HUGE. There are far more normal webs out there than there are spammy ones eh? The database growing without limits is what was killing the server back when central went dark. Francois had to take it down to keep the rest of the site running. It found a new home - dedicated box - and I set some rules in place to prune it back from time to time.
I think I covered most of it, but I also take advantage of the protected status. When a really old keyword stops getting new reporters I have to wonder if the offending site has gone dark or possibly changed their evil ways. What I started doing, and only to help stop the keyword list from growing, was to change a post from published to protected so that it would no longer be a keyword (although every user who already has it will have it until they personally remove it), but the database would know that once upon a time they were deemed evil.
Validation or verification prior to publishing a report (or making it a keyword) is all done by hand. Sometimes I can just look at the domain name and figure it's a spammer. Porn sites and pill sites are kinda easy, especially when I get a bunch of subdomains all getting a large amount of reporters in a very short time frame. Sometimes you can't tell just by looking at the site, so you have to trust the fact that it's been reported heavily (over a short period of time) and go with it.
Trivial data:
- 4,232 published posts (keywords).
- Only 54 protected posts, and not a one of them has had a reporter since unpublishing.
- 81,218 draft posts. 81,218 different character combinations with at least one reporter in the past 8 months.
- 47,518 deprecated posts.
- The newest post ID is 213,560. Do the math if you want, but almost half of what's been there is now gone.
- spammers actually comment-spam the central database. THAT is a sure-fire method of getting added to the published keyword list.
- xxx is a keyword that you get on initial installation AND the first time you ask for the list from central, yet still gets reported on a fairly regular basis.
So the whitelist thing is possible, but it'll take an army to manage it. And a server the size of Chicago. I'm not into it though. I'm into killing spammers :>
Whoo if you want to see the guts of the thing send me a PM and I'll set you up with a username and password.
6 village_idiot Oct 26, 2006 19:56
warning: short reply:
if it wasnt published (sent out) than how come I got that infamous message when I clicked through to therangers' blog (linked in my sidebar) ?
Im not doubting you, just wondering.
And actually, I have access to look, just not to meddle, dont I (isnt it located in the same place everything else is?) :)
7 edb Oct 26, 2006 20:43
theranger could have locally banned you (or just some part of your domain) without reporting you. It lives on it's own server. You are not on the list. It used to live where the main domain lives, but it moved when Francois kicked it out of the house.
8 village_idiot Oct 26, 2006 21:57
ahh oke, yes i do remember that now :)
Breaking out into multiple 5-character strings revealed only one combination that matched your domain. The matching string is "village-idiot.org", which is a pretty close match. That was reported by 2 domains with the last of them reporting you on 3/23/06. A quick peruse of your sidebar didn't show them there, but that doesn't mean they're not somewhere in a post.
Local unreported bans - of course - won't be in the central database.
Oh that report was not published and is now deprecated.
Check your PMs, please.