-->

Using wget via Ruby on Rails

2020-06-17 06:37发布

问题:

I want to build a simple website that can download a webpage www.example.com/index.html and store its snapshot on the server when the client requests. I'm thinking about using the command wget to download the webpage. Would Ruby on Rails be able to handle this task?

回答1:

Yes.

You can perform shell commands in Ruby via back ticks, exec and system. Note that each one returns something slightly different:

  1. back ticks

    wget http://www.yahoo.com
    
  2. exec:

    exec('wget http://www.yahoo.com')
    
  3. system:

    system('wget http://www.yahoo.com')
    

This blog post seems to be in the same vein as what you're trying to do.

Additionally, there are several terrific Ruby libraries for doing this:

  1. mechanize with mechanize download - check out this railscast
  2. httparty - simple wrapper around a more-difficult-to-use http library. Once you get the response body, you will need to save it to the database or file.
  3. typhoeus - simple mechanism for making the http requests in parallel, if you need such an ability

They will provide a much better cleaner Ruby interface for dealing with the data that comes back from the various requests.


The best way to test all of these options is to use the Rails console. Go to the root directory of your Rails app and type:

rails c

Once in the console, you can emulate the actual server calls.

Running wget in your console will drop the files in your Rails root directory, which is not what you want. tmp is a standard directory for such things. You can dynamically generate the path based on the URL like so:

# tmp directory
path = Rails.root.join('tmp')
# create sub-directory as md5 hash based on URL
sub_dir = Digest::MD5.hexdigest(url)
# append sub_dir on the path
destination_path = path.join(sub_dir) 
system("wget -P #{destination_path} #{url}")

Be sure to also include the options from this post