Different robots.txt for staging server on Heroku

I have staging and production apps on Heroku.

For crawler, I set robots.txt file.

After that I got message from Google.

Dear Webmaster, The host name of your site, https://www.myapp.com/, does not match any of the "Subject Names" in your SSL certificate, which were:
*.herokuapp.com
herokuapp.com

The Google bot read the robots.txt on my staging apps and send this message. because I didn't set anything for preventing crawlers to read the file.

So, what I'm thinking about is to change .gitignore file between staging and production, but I can't figure out how to do this.

What are the best practices for implementing this?

EDIT

I googled about this and found this article http://goo.gl/2ZHal

This article says to set basic Rack authentication and you won't need to care about robots.txt.

I didn't know that basic auth can prevent google bot. It seems this solution is better that manipulate .gitignore file.

标签： ruby-on-rails heroku gitignore

2条回答

Anthone

2楼-- · 2019-02-13 18:11

What about serving /robots.txt dynamically using a controller action instead of having a static file? Depending on the environment you allow or disallow search engines to index your application.

0人赞添加讨论(0) 举报

贪生不怕死

3楼-- · 2019-02-13 18:11

A great solution with Rails 3 is to use Rack. Here is a great post that outlines the process: Serving Different Robots.txt Using Rack. To summarize, you add this to your routes.rb:

 # config/routes.rb
 require 'robots_generator' # Rails 3 does not autoload files in lib 
 match "/robots.txt" => RobotsGenerator

and then create a new file inside lib/robots_generator.rb

# lib/robots_generator.rb
class RobotsGenerator
  # Use the config/robots.txt in production.
  # Disallow everything for all other environments.
  # http://avandamiri.com/2011/10/11/serving-different-robots-using-rack.html
  def self.call(env)
    body = if Rails.env.production?
      File.read Rails.root.join('config', 'robots.txt')
    else
      "User-agent: *\nDisallow: /"
    end

    # Heroku can cache content for free using Varnish.
    headers = { 'Cache-Control' => "public, max-age=#{1.month.seconds.to_i}" }

    [200, headers, [body]]
  rescue Errno::ENOENT
    [404, {}, ['# A robots.txt is not configured']]
  end
end

Finally make sure to include move robots.txt into your config folder (or wherever you specify in your RobotsGenerator class).

0人赞添加讨论(0) 举报

Different robots.txt for staging server on Heroku

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间