[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

mod_proxy_html and special characters

Hi all,

I'm currently facing an issue where the directive ProxyHTMLURLMap does not work. And I am not sure whether that is by design or not, and where I would appreciate some feedback.

Let's assume an imaginary backend server delivers a HTML page that contains a link like this:

<a href="http://internal/!%22%23$/";>A link with special characters</a>

Please note that %22 is the double quote that needs to be encoded to not break the HTML, and %23 is the '#' character, which we don't want to get treated as anchor in this case. So, the unencoded URL would look like this:


Now, Apache configured as reverse proxy should rewrite this link to http://external/!"#$/ (or http://external/!%22%23$/), but not any other links outside the sub directory /!"#$/ (nor /!%22%23$/). An imaginary configuration to achieve that and to showcase the issue I am trying to get feedback on looks like this:

ProxyHTMLURLMap "http://internal/!\"#$/"; "http://external/!\"#$/";

Please note that the double quote is only escaped here with a backslash to cater for the Apache configuration syntax requirements. This does not work, i.e. the URL in the HTML document doesn't get rewritten.

Let's try to better understand what exactly is happening here. Looking into the code of mod_proxy_html.c (trunk, SVN rev. 1832252), this is where the string comparison happens:

 524              s_from = strlen(m->from.c);
 525              if (!strncasecmp(ctx->buf, m->from.c, s_from)) {
 ...                  ... do the string replacement ...

... where ctx->buf is the URL found in the HTML document, and m->from.c is the first configured argument of ProxyHTMLURLMap. So, if the latter is a prefix of the first, this condition should be true and the string replacement should happen. When the expected string replacement doesn't happen, the condition is false and the values of the variables are:

ctx->buf  = http://internal/!%22%23$/
m->from.c = http://internal/!"#$/

So, the strings don't match and are not replaced for that reason.

Going forward I am not interested in finding a work around for this, but more how to approach a fix (if this is a bug at all).

Is it reasonable to expect mod_proxy_html to rewrite URL encoded URLs as well?

Let's assume this needs to be fixed. To make the strings match, we could either URL escape the value from the Apache directive ProxyHTMLURLMap, or URL temporarily URL-decode the string found in the HTML document just for the purpose of the string comparison. What is the right thing to do?

If you have managed read all this down to this line, I am curious about your feedback. :)