In order to remove index.html
or index.htm
from urls I use the following in my .htaccess
RewriteCond %{REQUEST_URI} /index\.html?$ [NC]
RewriteRule ^(.*)index\.html?$ "/$1" [NC,R=301,NE,L]
This works! (More info about flags at the end of this question *)
Then in order to add www
in urls I use the following in my .htaccess
RewriteCond %{HTTP_HOST} !^www\.mydomain\.com$ [NC]
RewriteRule ^(.*)$ "http://www.mydomain.com/$1" [R=301,NE,L]
This works too!
The question here is how to avoid the double redirection created by rules above in cases like the one below:
- browsers asks for
http://mydomain.com/path/index.html
- server sends
301
header to redircet browser to http://mydomain.com/path/
- then browser requests
http://mydomain.com/path/
- now the server sends
301
header to redircet browser to http://www.mydomain.com/path/
This is obviously not very smart cause a poor user who is asking http://mydomain.com/path/index.html
would be double redirected, and he would feel page goes too slow. Moreover Googlebot might stop following the link cause to the double redircetion (I'm not sure on this last one and I don't want to get into a discussion on this, it's just another possible issue.)
Thanks!
*To whom it might be interested:
NC
is used to redirect also
uppercased files i.e. INDEX.HTML
/
InDeX.HtM
NE
is used
to avoid double url encoding I avoid
http://.../index.html?hello=ba%20be
to be redirected to
http://.../index.html?hello=ba%2520be
QSA
is used to redirect
also queries, i.e.
http://.../index.html?hello=babe
to
http://.../?hello=babe
(not needed thanks to anubhava answer)
To avoid double redirection have another rule in .htaccess file that meets both conditions like this:
Options +FollowSymlinks -MultiViews
RewriteEngine on
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteCond %{REQUEST_URI} ^(.*/)index\.html$ [NC]
RewriteRule . http://www.%{HTTP_HOST}%1 [R=301,NE,L]
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule . http://www.%{HTTP_HOST}%{REQUEST_URI} [NE,R=301,L]
RewriteCond %{REQUEST_URI} ^(.*/)index\.html$ [NC]
RewriteRule . %1 [R=301,NE,L]
So if input URL is http://mydomain.com/path/index.html
then both the conditions get satisfied in the first rule here and there will be 1 single redirect (301) to http://www.mydomain.com/path/
.
Also I believe QSA
flag is not really needed above since you are NOT manipulating query string.
A better solution would be to place the index.html rule ahead of the www rule and inside the index.html rule ADD the www prefix to the destination url. This way someone looking for http://domain.com/index.html would get sent to http://www.domain.com/ by the FIRST rule. The second (www) rule would then only apply if index AND www are missing, which is again only one redirect.
Remove the L
flag from the prior rule? L
forces the rule parsing to stop (when the rule is matched) and thus send the first rewritten URL without applying the second rule.
The rules are applied sequentially from top to bottom, each rewriting the URL again if it matches the rule's conditions and pattern.
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule ^ http://www.%{HTTP_HOST}%{REQUEST_URI} [R=301]
RewriteRule ^(.*/)index\.html?$ $1 [NC,QSA,R=301,NE,L]
Hence the above will first add the www
and then remove the index.html?
, before sending the new URL; A single redirect for all the rules.