Removing 'index.html' from url and adding

2019-02-15 07:13发布

问题:

In order to remove index.html or index.htm from urls I use the following in my .htaccess

RewriteCond %{REQUEST_URI} /index\.html?$ [NC]
RewriteRule ^(.*)index\.html?$ "/$1" [NC,R=301,NE,L]

This works! (More info about flags at the end of this question *)

Then in order to add www in urls I use the following in my .htaccess

RewriteCond %{HTTP_HOST} !^www\.mydomain\.com$ [NC]
RewriteRule ^(.*)$ "http://www.mydomain.com/$1" [R=301,NE,L]

This works too!

The question here is how to avoid the double redirection created by rules above in cases like the one below:

  1. browsers asks for http://mydomain.com/path/index.html
  2. server sends 301 header to redircet browser to http://mydomain.com/path/
  3. then browser requests http://mydomain.com/path/
  4. now the server sends 301 header to redircet browser to http://www.mydomain.com/path/

This is obviously not very smart cause a poor user who is asking http://mydomain.com/path/index.html would be double redirected, and he would feel page goes too slow. Moreover Googlebot might stop following the link cause to the double redircetion (I'm not sure on this last one and I don't want to get into a discussion on this, it's just another possible issue.)

Thanks!


*To whom it might be interested:

  • NC is used to redirect also uppercased files i.e. INDEX.HTML / InDeX.HtM
  • NE is used to avoid double url encoding I avoid http://.../index.html?hello=ba%20be to be redirected to http://.../index.html?hello=ba%2520be
  • QSA is used to redirect also queries, i.e. http://.../index.html?hello=babe to http://.../?hello=babe (not needed thanks to anubhava answer)

回答1:

To avoid double redirection have another rule in .htaccess file that meets both conditions like this:

Options +FollowSymlinks -MultiViews
RewriteEngine on

RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteCond %{REQUEST_URI} ^(.*/)index\.html$ [NC]
RewriteRule . http://www.%{HTTP_HOST}%1 [R=301,NE,L]

RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule . http://www.%{HTTP_HOST}%{REQUEST_URI} [NE,R=301,L]

RewriteCond %{REQUEST_URI} ^(.*/)index\.html$ [NC]
RewriteRule . %1 [R=301,NE,L]

So if input URL is http://mydomain.com/path/index.html then both the conditions get satisfied in the first rule here and there will be 1 single redirect (301) to http://www.mydomain.com/path/.

Also I believe QSA flag is not really needed above since you are NOT manipulating query string.



回答2:

A better solution would be to place the index.html rule ahead of the www rule and inside the index.html rule ADD the www prefix to the destination url. This way someone looking for http://domain.com/index.html would get sent to http://www.domain.com/ by the FIRST rule. The second (www) rule would then only apply if index AND www are missing, which is again only one redirect.



回答3:

Remove the L flag from the prior rule? L forces the rule parsing to stop (when the rule is matched) and thus send the first rewritten URL without applying the second rule.

The rules are applied sequentially from top to bottom, each rewriting the URL again if it matches the rule's conditions and pattern.

RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule ^ http://www.%{HTTP_HOST}%{REQUEST_URI} [R=301]

RewriteRule ^(.*/)index\.html?$ $1 [NC,QSA,R=301,NE,L]

Hence the above will first add the www and then remove the index.html?, before sending the new URL; A single redirect for all the rules.