27

When I browse to this URL: http://localhost:8080/foo/%5B-%5D server (nc -l 8080) receives it as-is:

GET /foo/%5B-%5D HTTP/1.1

However when I proxy this application via nginx (1.1.19):

location /foo {
        proxy_pass    http://localhost:8080/foo;
}

The same request routed through nginx port is forwarded with path decoded:

GET /foo/[-] HTTP/1.1

Decoded square brackets in the GET path are causing the errors in the target server (HTTP Status 400 - Illegal character in path...) as they arrive un-escaped.

Is there a way to disable URL decoding or encode it back so that the target server gets the exact same path when routed through nginx? Some clever URL rewrite rule?

Tomasz Nurkiewicz
  • 674
  • 1
  • 5
  • 10

2 Answers2

23

Quoting Valentin V. Bartenev (who should get the full credit for this answer):

A quote from documentation:

  • If proxy_pass is specified with URI, when passing a request to the server, part of a normalized request URI matching the location is replaced by a URI specified in the directive

  • If proxy_pass is specified without URI, a request URI is passed to the server in the same form as sent by a client when processing an original request

The correct configuration in your case would be:

location /foo {
   proxy_pass http://localhost:8080;
}
Tomasz Nurkiewicz
  • 674
  • 1
  • 5
  • 10
  • 10
    I had to change `http://localhost:8080/` to `http://localhost:8080` in case anyone has the same situation as I did. – herrtim Aug 22 '13 at 20:23
  • 6
    Why does Nginx decode the URI before passing it to the backend server? Wouldn't it make more sense if it kept the URI untouched? – platypus Jan 07 '14 at 07:46
  • 1
    @platypus, it is kept untouched, until you explicitly start performing the substitutions – cnst Apr 06 '18 at 21:30
11

Note that URL decoding, commonly known as $uri "normalisation" within the documentation of nginx, happens before the backend IFF:

  • either any URI is specified within proxy_pass itself, even if just the trailing slash all by itself,

  • or, URI is changed during the processing, e.g., through rewrite.


Both conditions are explicitly documented at http://nginx.org/r/proxy_pass (emphasis mine):

  • If the proxy_pass directive is specified with a URI, then when a request is passed to the server, the part of a normalized request URI matching the location is replaced by a URI specified in the directive

  • If proxy_pass is specified without a URI, the request URI is passed to the server in the same form as sent by a client when the original request is processed, or the full normalized request URI is passed when processing the changed URI


The solution is to either omit the URI as in OPs case, or, indeed, use a clever rewrite rule:

# map `/foo` to `/foo`:
location /foo {
    proxy_pass  http://localhost:8080;  # no URI -- not even just a slash
}

# map `/foo` to `/bar`:
location /foo {
    rewrite  ^  $request_uri;            # get original URI
    rewrite  ^/foo(/.*)  /bar$1  break;  # drop /foo, put /bar
    return 400;   # if the second rewrite won't match
    proxy_pass    http://localhost:8080$uri;
}

You can see it live in a related Stack Overflow answer, including control group.

cnst
  • 12,948
  • 7
  • 51
  • 75
  • 2
    The documentation is confusing here. Both forms contain a URI. It is the _path component_ that is present in one and missing in the other. – Michael Hampton Apr 07 '18 at 16:36
  • @MichaelHampton, I disagree — the PATH is generally called the URI, so, the one without the path, doesn't contain the URI. – cnst Apr 07 '18 at 18:39
  • A relative path alone can also be a valid URL, of course. The point is, the remainder is also a valid URI (e.g. `http://localhost:8080`). If you disagree, you can take it up with the authors of RFC 3986. – Michael Hampton Apr 07 '18 at 18:46
  • @MichaelHampton Unforturnately, it seems scheme and path are mandatory to be an URI, authority, arguments, fragment are optional – Norman Xu Dec 04 '18 at 14:58
  • 1
    Is it just me or is the standard behaviour wacky? We don't want URLs changed just because we happen to rewrite to a path instead of to the root!! – Marc Jun 15 '20 at 12:27
  • @Marc just you. The standard behaviour is to preemptively address many security pitfalls, and ensure you can't blame your security issues on nginx. P.S. Did you notice the `return 400` in this answer? I bet most folks don't bother to understand what it's for, or deem it unnecessary, even though it's pretty essential for security. – cnst Jun 15 '20 at 15:33
  • If I pass `/foo%20bar` to NGINX and it passes literally `/yo/foo bar` (an invalid URL containing a space) downstream which then fails then the behaviour is wrong/buggy. See https://trac.nginx.org/nginx/ticket/1930 – Marc Jun 16 '20 at 05:56
  • @Marc no, you're incorrect, and your comment is very misleading — nginx will never pass a space upstream if you use the correct configuration as has been pointed out in that [trac issue you link to](https://trac.nginx.org/nginx/ticket/1930); you have an incorrect use of regular expression captures that's causing your problem; your configuration sample is not the best practice even if it'd have worked as you may expect; I agree 100% with the nginx devs in that trac issue that the defect report is invalid. – cnst Jun 16 '20 at 18:14
  • OK, I will look at the suggestions there. I still think there should be a way to get escaped URL components like `foo%20bar` - NGINX seems to think we only need unescaped values `foo bar`. – Marc Jun 17 '20 at 05:09
  • @Marc again, your statement is incorrect; the devs have pointed out where your problem was and what the correct solution and best practice should be; you never explained why the proposed solution wouldn't work for your usecase; so, frankly, I don't even understand what you're trying to argue here anymore, because your proposed solution (that would let you use configuration that you've been already told is suboptimal in the first place) would break other usecases. – cnst Jun 17 '20 at 21:30
  • That's an arrogant statement. I have taken their feedback onboard. Perhaps you can explain why you think it makes sense to decode URL elements into `$1`? I certainly have illustrated cases where it is a problem and BREAKS HTTP and don't see anyone offering examples where this is a good idea. Why whould NGINX decode URIs into variables?? How can we re-encode them?? – Marc Jun 18 '20 at 05:52
  • Marc, your configuration is just wrong. It has been explained in Trac, as well as here, several times. The proper solution has been explained as well; again, you never once indicated why the proper solution that has been suggested wouldn't work for you. If you don't want to follow the proper solution, that doesn't mean that it's nginx that's broken. Please stop posting misleading statements about nginx. What do you think happens when nginx receives a request for `GET /../../../../../../etc/passwd`? Which non-regex `location` would catch it? What about `GET /this%20is%20a%20test.txt`? – cnst Jun 18 '20 at 22:38