I have staging and production apps on Heroku.
For crawler, I set robots.txt file.
After that I got message from Google.
Dear Webmaster, The host name of your site, https://www.myapp.com/, does not match any of the "Subject Names" in your SSL certificate, which were:
*.herokuapp.com
herokuapp.com
The Google bot read the robots.txt on my staging apps and send this message. because I didn't set anything for preventing crawlers to read the file.
So, what I'm thinking about is to change .gitignore file between staging and production, but I can't figure out how to do this.
What are the best practices for implementing this?
EDIT
I googled about this and found this article http://goo.gl/2ZHal
This article says to set basic Rack authentication and you won't need to care about robots.txt.
I didn't know that basic auth can prevent google bot. It seems this solution is better that manipulate .gitignore file.
What about serving
/robots.txt
dynamically using a controller action instead of having a static file? Depending on the environment you allow or disallow search engines to index your application.A great solution with Rails 3 is to use Rack. Here is a great post that outlines the process: Serving Different Robots.txt Using Rack. To summarize, you add this to your routes.rb:
and then create a new file inside lib/robots_generator.rb
Finally make sure to include move robots.txt into your config folder (or wherever you specify in your
RobotsGenerator
class).