Redirect loop using Apache mod_rewrite (clean URLs

2019-09-21 14:49发布

问题:

My situation is very similar to the one in this question (in fact, the code is very similar). I've been trying to create a .htaccess file to use URLs without file extensions so that e.g. https://example.com/file finds file.html in the appropriate directory, but also that https://example.com/file.html redirects (using a HTTP redirect) to https://example.com/file so there is only one canonical URL. With the following .htaccess:

Options +MultiViews
RewriteEngine On

# Redirect <...>.php, <...>.html to <...> (without file extension)
RewriteRule ^(.+)\.(php|html)$ /$1 [L,R]

I've been running into a redirect loop just as in the question mentioned above. (In my case, finding the corresponding file is achieved by MultiViews instead of a separate RewriteRule.)

However, with a solution adopted from this answer:

Options +MultiViews
RewriteEngine On

# Redirect <...>.php, <...>.html to <...> (without file extension)
RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\s(.+)\.(php|html)
RewriteRule ^ %1 [L,R]

there is no redirect loop. I’d be interested to find out where the difference comes from. Aren’t both files functionally equivalent? How come that using a “normal” RewriteRule creates a loop, while using %{THE_REQUEST} doesn’t?

Note that I’m not looking for a way to get clean URLs (I could just use the second version of my file or the answer to the question linked above, which looks at %{ENV:REDIRECT_STATUS}), but for the reason why these two approaches work/don’t work, so this is not the same question as the one linked above.

Note: I'm seeing the same problem using only mod_rewrite (without MultiViews), so it doesn't seem to be due to the order of execution of MultiViews and mod_rewrite:

Options -MultiViews
RewriteEngine On

## Redirect <...>.php, <...>.html to <...> (without file extension)
# This works...
RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\s(.+)\.(php|html)
RewriteRule ^ %1 [L,R]
# But this doesn’t!
#RewriteRule ^(.+)\.(php|html)$ /$1 [L,R]

# Find file with file extension .php or .html on the filesystem for a URL
# without file extension
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME}\.php -f
RewriteRule ^ %{REQUEST_FILENAME}.php [L]
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^ %{REQUEST_FILENAME}.html [L]

Where’s the difference? I would expect both approaches to work because the internal rewrite to a file is at the very end of the .htaccess with an [L] flag, so there shouldn't be any processing or redirecting happening afterwards, right?

回答1:

If you look at RewriteRule directive's documentation, you'll notice the following:

On the first RewriteRule, it is matched against the (%-decoded) URL-path of the request, or, in per-directory context (see below), the URL path relative to that per-directory context. Subsequent patterns are matched against the output of the last matching RewriteRule.

Since, it will be matched on a per directory basis, once you put the following:

RewriteRule ^(.+)\.(php|html)$ /$1 [L,R]

the REQUEST_URI variable changes, and mod-rewrite parses the URI again. This leads to MultiViews rewriting the URL to the proper file matching this redirected URL and causing a loop (URI changes on every rewrite).

Now, when you put THE_REQUEST variable to match against, the URI may change on internal rewrites, but the actual request as received by the server would never change unless a redirect is performed.



回答2:

# But this doesn’t!
#RewriteRule ^(.+)\.(php|html)$ /$1 [L,R]

Reason why this commented rule doesn't work and causes rewrite loop because your other rule is adding .html extension and changing %{REQUEST_URI} variable to /file.html thus causing this rule to execute again. And taking out .html from rule causes other rule to fire again. This goes on until max recursion limit is reached.

You also need to understand that mod_rewrite runs in a loop until a rule doesn't match. Since both rules keep firing therefore mod_rewrite keeps looping.

Reason why rule based on THE_REQUEST works because THE_REQUEST variable doesn't get overwritten after execution of other rewrite rules.