Is there a way to make robots ignore certain text?

2019-01-17 14:52发布

I have my blog (you can see it if you want, from my profile), and it's fresh, as well as google robots parsing results are.

The results were alarming to me. Apparently the most common 2 words on my site are "rss" and "feed", because I use text for links like "Comments RSS", "Post Feed", etc. These 2 words will be present in every post, while other words will be more rare.

Is there a way to make these links disappear from Google's parsing? I don't want technical links getting indexed. I only want content, titles, descriptions to get indexed. I am looking for something other than replacing this text with images.

I found some old discussions on Google, back from 2007 (I think in 3 years many things could have changed, hopefully this too)

This question is not about robots.txt and how to make Google ignore pages. It is about making it ignore small parts of the page, or transforming the parts in such a way that it will be seen by humans and invisible to robots.

8条回答
相关推荐>>
2楼-- · 2019-01-17 15:35

The only control that you have over the indexing robots, is the robots.txt file. See this documentation, linked by Google on their page explaining the usage of the file.

You basically can prohibit certain links and URL's but not necessarily keywords.

查看更多
老娘就宠你
3楼-- · 2019-01-17 15:39

you have to manually detect the "Google Bot" from request's user agent and feed them little different content than you normally serve to your user.

查看更多
Anthone
4楼-- · 2019-01-17 15:40

There is a simple way to tell google to not index parts of your documents, that is using googleon and googleoff:

<p>This is normal (X)HTML content that will be indexed by Google.</p>

<!--googleoff: index-->

<p>This (X)HTML content will NOT be indexed by Google.</p>

<!--googleon: index-->

In this example, the second paragraph will not be indexed by Google. Notice the “index” parameter, which may be set to any of the following:

  • index — content surrounded by “googleoff: index” will not be indexed by Google

    anchor — anchor text for any links within a “googleoff: anchor” area will not be associated with the target page

    snippet — content surrounded by “googleoff: snippet” will not be used to create snippets for search results

    all — content surrounded by “googleoff: all” are treated with all

source

查看更多
别忘想泡老子
5楼-- · 2019-01-17 15:42

No, there really isn't anything like that. There are various server-side techniques, but if Google catches you serving up different text to its bot than you give to website visitors it will penalize you.

查看更多
迷人小祖宗
6楼-- · 2019-01-17 15:43

I work on a site with top-3 google ranking for thousands of school names in the US, and we do a lot of work to protect our SEO. There are 3 main things you could do (which are all probably a waste of time, keep reading):

  • Move the stuff you want to downplay to the bottom of your HTML and use CSS and/or to place it where you want readers to see it. This won't hide it from crawlers, but they'll value it lower.
  • Replace those links with images (you say you don't want to do that, but don't explain why not)
  • Serve a different page to crawlers, with those links stripped. There's nothing black hat about this, as long as the content is fundamentally the same as a browser sees. Search engines will ding you if you serve up a page that's significantly different from what users see, but if you stripped RSS links from the version of the page crawlers index, you would not have a problem.

That said, crawlers are smart, and you're not the only site filled with permalink and rss links. They care about context, and look for terms and phrases in your headings and body text. They know how to determine that your blog is about technology and not RSS. I highly doubt those links have any negative effect on your SEO. What problem are you actually trying to solve?

If you want to build SEO, figure out what value you provide to readers and write about that. Say interesting things that will lead others to link to your blog, and crawlers will understand that you're an information source that people value. Think more about what your readers see and understand, and less about what you think a crawler sees.

查看更多
来,给爷笑一个
7楼-- · 2019-01-17 15:47

Google crawler are smart but someone that program them are smartest. Human always sees what is sensible in the page, they will spend time on blog that have some nice content and most rare and unique. It is all about common sense, how people visit your blog and how much time they spend. Google measure the search result in the same way. Your page ranking also increase as daily visits increase and site content get better and update every day. This page has "Answer" words repeated multiple times. It doesn't mean that it will not get indexed. It is how much useful is to every one. I hope it will give you some idea

查看更多
登录 后发表回答