Recent Topics

1 May 31, 2006 22:12    

I don't think this is really a bug and you may not want to take any action, but I thought I should bring it up so everyone is aware. One of the users of my 1.8-dev install reported that a blogspot.com user was getting the 403 page when trying to click through to the b2evolution blog from mr_plenty.blogspot.com. When I didn't find any matches in the antispam list I traced the problem to the validate_url() function in /inc/_misc/_misc.funcs.php

        if( ! preg_match('~^               # start
                ([a-z][a-z0-9+.\-]*)             # scheme
                ://                              # authority absolute URLs only
                (\w+(:\w+)?@)?                   # username or username and password (optional)
                [a-z0-9]([a-z0-9.\-])*           # Don t allow anything too funky like entities
                (:[0-9]+)?                       # optional port specification
                (/|$)~ix', $url, $matches) ) 

I changed

[a-z0-9]([a-z0-9.\-])*           # Don t allow anything too funky like entities


to

[a-z0-9]([a-z0-9.\-_])*           # Don t allow anything too funky like entities


So the underscore is allowed. I know it's not a proper subdomain, but blogspot and some other crap sites apparently allow it. I suggested that the blogspot user change their url to something more valid to prevent this and other problems (e.g., my Squid proxy can't go to the site at all).

As to whether it's valid, [url=http://rfc.net/rfc1034.html]RFC 1034[/url] says:

The labels must follow the rules for ARPANET host names. They must start with a letter, end with a letter or digit, and have as interior characters only letters, digits, and hyphen.

That doesn't mention underscores, so I guess they're not kosher. For my part, I'm fine with keeping things the way they are. But if the issue comes up, and someone searches on it, then there's the explanation and a fix if they're so inclined.

2 Jun 01, 2006 00:13

Thanks. I always wondered about the ONE keyword with a _ in it. Under the new method it'll go away when it's time is up, but I always wondered *why* that keyword would be there. Now I know: to block nothing!

I also didn't know some sites would allow a _ in the subdomain portion.


Form is loading...