1 jfeise Oct 23, 2011 06:10
3 jfeise Oct 26, 2011 04:26
I verified that this, using the ~ as delimiter, is indeed the cause for this problem.
Please select another character, one that is not as likely to appear in an URL, as delimiter for the regex pattern matching.
4 fplanque Oct 26, 2011 04:43
Thanks for pointing this out. We need to change the character indeed.
5 sam2kb Oct 26, 2011 05:54
What happens if you add ¤ char to the URL? Not that it's a common char in URLs but still... it will break the regexp
What we really need to do is escape delimiter char:
$blog_baseuri_regexp = '~^'.preg_quote( $blog_baseuri_regexp, '~' ).'(\.php[0-9]?)?/(.+)$~';
6 jfeise Oct 26, 2011 06:11
Well, a ¤ char is not a valid character in an URL.
RFC 2396 Appendix A specifies the BNF for URIs, including the allowed characters.
Other characters would have to be escaped with % hex hex.
7 sam2kb Oct 26, 2011 06:14
RFC 2396 Appendix A won't stop me from pasting ¤ into the address bar and hitting enter ;)
Escaped delimiter will not break the regex, no matter if it's ¤ or ~
8 sam2kb Oct 26, 2011 06:15
Fixed in CVS
9 jfeise Oct 26, 2011 06:37
sam2kb wrote:
RFC 2396 Appendix A won't stop me from pasting ¤ into the address bar and hitting enter ;)
Well, sure. But don't expect the web server to even give it to b2evo ;)
Most likely, the web server will bail out before it hits b2evo. They do a bunch of sanity checks, for security reasons. Nothing worse than a buffer overflow due to bad characters in the URL...
10 sam2kb Oct 26, 2011 06:41
What I mean is it's perfectly safe to use any delimiter character if it's escaped. That's what the second parameter in preg_quote() function is used for.
I think I have an idea what is going wrong with this.
I checked the CVS history, and I think the problem is this change:
Revision 1.173: Replace non-ASCII character in regular expressions with ~
Diffing the code, I see this:
I think the problem is the ~.
My URL is http://localhost/~jfeise/blogs/...
So I think that the change operates on the ~ instead of using the ~ as pattern delimiters in the preg_replace and preg_quote. That may have been the main reason to use a non-Ascii character as delimiter.
The "unknown modifier 'j'" stuff would be the result of the 'j' being the first character after the ~ in the URL.
I'll revert this in my code tonight to see if that's really the issue.