2 jfeise Oct 25, 2011 23:45

I verified that this, using the ~ as delimiter, is indeed the cause for this problem.
Please select another character, one that is not as likely to appear in an URL, as delimiter for the regex pattern matching.
Thanks for pointing this out. We need to change the character indeed.
What happens if you add ¤ char to the URL? Not that it's a common char in URLs but still... it will break the regexp
What we really need to do is escape delimiter char:
$blog_baseuri_regexp = '~^'.preg_quote( $blog_baseuri_regexp, '~' ).'(\.php[0-9]?)?/(.+)$~';
Well, a ¤ char is not a valid character in an URL.
RFC 2396 Appendix A specifies the BNF for URIs, including the allowed characters.
Other characters would have to be escaped with % hex hex.
RFC 2396 Appendix A won't stop me from pasting ¤ into the address bar and hitting enter ;)
Escaped delimiter will not break the regex, no matter if it's ¤ or ~
Fixed in CVS
sam2kb wrote:
RFC 2396 Appendix A won't stop me from pasting ¤ into the address bar and hitting enter ;)
Well, sure. But don't expect the web server to even give it to b2evo ;)
Most likely, the web server will bail out before it hits b2evo. They do a bunch of sanity checks, for security reasons. Nothing worse than a buffer overflow due to bad characters in the URL...
What I mean is it's perfectly safe to use any delimiter character if it's escaped. That's what the second parameter in preg_quote() function is used for.
I think I have an idea what is going wrong with this.
I checked the CVS history, and I think the problem is this change:
Revision 1.173: Replace non-ASCII character in regular expressions with ~
Diffing the code, I see this:
I think the problem is the ~.
My URL is http://localhost/~jfeise/blogs/...
So I think that the change operates on the ~ instead of using the ~ as pattern delimiters in the preg_replace and preg_quote. That may have been the main reason to use a non-Ascii character as delimiter.
The "unknown modifier 'j'" stuff would be the result of the 'j' being the first character after the ~ in the URL.
I'll revert this in my code tonight to see if that's really the issue.