Bot-blocking code ignored in htaccess?

2020-04-19 05:54发布

问题:

I've been trying to solve this for several days now, but can't find an answer. On a shared hosting account I'm using, I'd like to modify the .htaccess file to block certain bots from visiting the site. This is the code I've used:

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
SetEnvIfNoCase User-Agent .*dotbot.* bad_bot
SetEnvIfNoCase User-Agent .*gigabot.* bad_bot
SetEnvIfNoCase User-Agent .*ahrefsbot.* bad_bot
<Limit GET POST HEAD>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress

It's a WordPress site. Strangely enough, it seems that it's just the bot blocking part that is being ignored. I've tried using a 302 redirect and it worked fine, so the file is being read and processed.

I've also noticed that this code does seem to work on some sites, but not for others. Could it have something to do with this being an addon domain? However it seems to be the case that the code is not working on some primary domains either.

The .htaccess file (together with the domain and wordpress installation) is located in home/maindomain/addondomain and not in home/maindomain/public_html. There are other .htaccess files located in directories for the other domains I host there, but only ones that are on the same hierarchical level as this one, so for example:

folder1/.htaccess

folder2/.htaccess

thisfolder/.htaccess

But none of the others have any lines of code dealing with bots, so I don't think there should be any interference from them.

I've also tried using different syntax with no success, for example:

RewriteCond %{HTTP_USER_AGENT} .*dotbot.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*gigabot.* [OR]
RewriteCond %{HTTP_USER_AGENT} .*ahrefsbot.* [OR]
RewriteRule ^(.*)$ http://www.example.com/ [L,R=301]

Another thing I tried was moving the bot blocking parts below the wordpress specific code. And I also tried changing the placement of Rewritebase /

I got a reply to my last question saying that it could be done via robots.txt. I'd rather not do that though as it would defeat the purpose I'm trying to achieve.

Another answer I received suggested removing the .* before and after botnames. The names I've listed are only part of the full name, so I thought I'd use these wildcards, or do they get automatically added with these commands?

回答1:

Do you know you can control most of those bots using a robots.txt file? It's a much better way of instructing those bots not to visit certain parts of your site, since it'll stop them from even attempting to fetch a URL, so you don't need to block it either.