Robots.txt restriction of category URLs

2020-05-09 16:33发布

I was unable to find information about my case. I want to restrict the following types of URLs to be indexed:

website.com/video-title/video-title/

(my website produces such double URL copies of my video-articles)

Each video article starts with the word "video" in the beginning of its URL.

So what I want to do is to restrict all URLs that have website.com/"any-url"/video-any-url"

This way I will remove all the doubled copies. Could somebody help me?

1条回答
地球回转人心会变
2楼-- · 2020-05-09 17:19

This is not possible in the original robots.txt specification.

But some parsers may support wildcards in Disallow anyway, for example, Google:

Googlebot (but not all search engines) respects some pattern matching.

So for Google’s bots, you could use the following line:

Disallow: /*/video

This should block any URLs whose paths starts with anything, and contains "video", for example:

  • /foo/video
  • /foo/videos
  • /foo/video.html
  • /foo/video/bar
  • /foo/bar/videos
  • /foo/bar/foo/bar/videos

Other parsers not supporting this would interpret it literally, i.e., they would block the following URLs:

  • /*/video
  • /*/videos
  • /*/video/foo
查看更多
登录 后发表回答