External links URL encoding leads to '?' a

2019-01-26 05:58发布

问题:

I got a problem with my server. I got four inbound links to different sites of my dynamic webpage which look something like this:

myurl.com/default/Site%3Fid%3D13

They should look like this:

myurl.com/default/Site?id=13

I do know that those %3F is an escape sequence for the ? sign and the %3D is an escape sequence for the equal sign. But I do get an error 400 when I use those links. What can I do about that?

The four links are for different sites, and I imagine over time there will be more links like that. So one fix for all would be perfect.

回答1:

An exact same question was actually asked on nginx-ru mailing list about a year ago:

http://mailman.nginx.org/pipermail/nginx-ru/2013-February/050200.html

The most helpful response, by an Nginx, Inc, employee/developer, Валентин Бартенев:

http://mailman.nginx.org/pipermail/nginx-ru/2013-February/050209.html

Если запрос приходит в таком виде, то это уже не параметры, а имя запрошенного файла. Другое дело, что location ищется по уже раскодированному адресу, о чем в документации написано.

Translation:

If the request comes in such a form, then these are no longer the args, but the name of the requested file. Another thing is that, as documented, the location matching is performed against a normalised URI.

His suggested solution, translated to the sample example from the question here at SO, would then be:

location /default/Site? {
    rewrite \?(.*)$ /default/Site?$1? last;
}

location = /default/Site {
    [...]
}


回答2:

The following sample would redirect all wrongly-looking requests (defined as having ? in the requested filename — encoded as %3F in the request) into less wrongly-looking ones, regardless of URL.

(Please note that, as rightly advised elsewhere, you should not be getting these wrongly-formed links in the first place, so, use it as a last resort — only when you cannot correct the wrongly formed links otherwise, and you do know that such requests are attempted by valid agents.)

server {
    listen      [::]:80;
    server_name localhost;

    rewrite     ^/([^?]*)\?(.*)$    /$1?$2?     permanent;
    location / {
        return  200 "id is $arg_id\n";
    }
}

This is example of how it would work — when a wrongly looking request is encountered, a correction attempt is made with a 301 Moved Permanently response with a supposedly correct Location response header, which would make the browser automatically re-issue the request to the newly provided location:

opti# curl -6v "http://localhost/default/Site%3Fid%3D13"
* About to connect() to localhost port 80 (#0)
*   Trying ::1...
* connected
* Connected to localhost (::1) port 80 (#0)
> GET /default/Site%3Fid%3D13 HTTP/1.1
> User-Agent: curl/7.26.0
> Host: localhost
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Server: nginx/1.4.1
< Date: Wed, 15 Jan 2014 17:09:25 GMT
< Content-Type: text/html
< Content-Length: 184
< Location: http://localhost/default/Site?id=13
< Connection: keep-alive
<
<html>
<head><title>301 Moved Permanently</title></head>
<body bgcolor="white">
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx/1.4.1</center>
</body>
</html>
* Connection #0 to host localhost left intact
* Closing connection #0

Note that no correction attempts are made on proper-looking requests:

opti# curl -6v "http://localhost/default/Site?id=13"
* About to connect() to localhost port 80 (#0)
*   Trying ::1...
* connected
* Connected to localhost (::1) port 80 (#0)
> GET /default/Site?id=13 HTTP/1.1
> User-Agent: curl/7.26.0
> Host: localhost
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/1.4.1
< Date: Wed, 15 Jan 2014 17:09:30 GMT
< Content-Type: application/octet-stream
< Content-Length: 9
< Connection: keep-alive
<
id is 13
* Connection #0 to host localhost left intact
* Closing connection #0


回答3:

The URL is perfectly valid. The escaped characters it contains are just that, escaped. Which is perfectly fine.

The purpose is that you can actually have a request name (in most cases corresponding to the filename on the disk) that is Site?id=13 and not Site and the rest as the query string.

I would consider it bad practice to have characters in a filename that makes this necessary. However, in URL arguments it may very well be necessary.

Nevertheless, the request URL is valid, and probably not what you want it to be. Which consequently suggest that you should correct the error wherever anybody has picked up the wrong URL in the first place.

I do not really understand why you get an error 400; you should rather get an error 404. But that depends on your setup.

There are also cases, especially with nginx, that mostly involve passing on whole URLs and URL parts along multiple levels (for example reverse proxies, matching regular expressions from the URL and using them as variables, etc.) where such an error may occur. But to verify this and fix it we would need to know more about your setup.