I am creating two pages on my site that are very similar but serve different purposes. One is to thank users for leaving a comment and the other is to encourage users to subscribe.
I don't want the duplicate content but I do want the pages to be available. Can I set the sitemap to hide one? Would I do this in the robots.txt file?
The disallow looks like this:
Disallow: /wp-admin
How would I customize to the a specific page like:
This is very simple, any page that you want to disallow, just give root url of this file or folder. Just put this into your robots.txt file.
robots.txt files use regular expressions to match pages, so to avoid targeting more pages than you intend, you may need to add a $ to the end of the page name:
If you don't you'll also disallow page /thank-you-for-commenting-on-this-too
in robots.txt
Take a look at last.fm robots.txt file for inspiration.
You can also add a specific page with extension in robots.txt file. In case of testing, you can specify the test page path to disallow robots from crawling.
For examples:
The first one
Disallow: /index_test.php
will disallow bots from crawling the test page in root folder.Second
Disallow: /products/test_product.html
will disallow test_product.html under the folder 'products'.Finally the last example
Disallow: /products/
will disallow the whole folder from crawling.