Stop Google from indexing

2019-01-16 13:40发布

Is there a way to stop Google from indexing a site?

9条回答
2楼-- · 2019-01-16 14:04

There are several way to stop crawlers including Google to stop crawling and indexing your website.

At server level through header

Header set X-Robots-Tag "noindex, nofollow"

At root domain level through robots.txt file

User-agent: *
Disallow: /

At page level through robots meta tag

<meta name="robots" content="nofollow" />

However, I must say if your website has outdated and not existing pages/urls then you should wait for sometime Google will automatically deindex those urls in next crawl - read https://support.google.com/webmasters/answer/1663419?hl=en

查看更多
SAY GOODBYE
3楼-- · 2019-01-16 14:04

Bear in mind that microsoft's crawler for Bing, despite their claim to obey robots.txt, does not always do so.

Our server stats indicate that they have a number of IP's that run crawlers that do not obey robots.txt as well as a number of ones that do.

查看更多
戒情不戒烟
4楼-- · 2019-01-16 14:11

I gotta add my answer here, as accepted answer doesn't really touch the problem properly. Also remember that preventing Google from crawling doesn't mean you can keep your content private.

My answer is based on few sources: https://developers.google.com/webmasters/control-crawl-index/docs/getting_started https://sites.google.com/site/webmasterhelpforum/en/faq--crawling--indexing---ranking

robots.txt file controls crawling, but not indexing! Those two are completely different actions, performed separately. Some pages may be crawled but not indexed, and some may even be indexed but never crawled. The link to non-crawled page may exist on other websites, which will make Google indexer to follow it, and try to index.

Question is about indexing which is gathering data about the page so it may be available through search results. It can be blocked adding meta tag:

<meta name="robots" content="noindex" />

or adding HTTP header to response:

X-Robots-Tag: noindex

If the question is about crawling then of course you could create robots.txt file and put following lines:

User-agent: *
Disallow: /

Crawling is an action performed to gather information about the structure of one specific website. E.g. you've added the site through Google Webmaster Tools. Crawler will take it on account, and visit your website, searching for robots.txt. If it doesn't find any, then it will assume that it can crawl anything (it's very important to have sitemap.xml file as well, to help in this operation, and specify priorities and define change frequencies). If it finds the file, it will follow the rules. After successful crawling it will at some point run indexing for crawled pages, but you can't tell when...

Important: this all means that your page can still be shown in Google search results regardless of robots.txt.

I hope at least some users will read this answer, and have it clear, as it's crucial to know what actually happens.

查看更多
闹够了就滚
5楼-- · 2019-01-16 14:13

use a nofollow meta tag:

<meta name="robots" content="nofollow" />

To specify nofollow at the link level, add the attribute rel with the value nofollow to the link:

<a href="example.html" rel="nofollow" />
查看更多
爱情/是我丢掉的垃圾
6楼-- · 2019-01-16 14:15

You can disable this server wide by adding the below setting in globally in apache conf or the same parameters can be used in vhost for disabling it for particular vhost only.

Header set X-Robots-Tag "noindex, nofollow"

Once this is done you can test it by verifying apache headers returned.

curl -I staging.mywebsite.com HTTP/1.1 302 Found Date: Sat, 26 Nov 2016 22:36:33 GMT Server: Apache/2.4.18 (Ubuntu) Location: /pages/ X-Robots-Tag: noindex, nofollow Content-Type: text/html; charset=UTF-8

查看更多
Emotional °昔
7楼-- · 2019-01-16 14:16

Also you can add the meta robots in this way:

<head>
<title>...</title>
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
</head>

And another extra layer is to modify .htaccess, but you need to check it deeply.

查看更多
登录 后发表回答