I'm having issues getting data from GitHub Archive.
The main issue is my problem with encoding {}
and ..
in my URL. Maybe I am misreading the Github API or not understanding encoding correctly.
require 'open-uri'
require 'faraday'
conn = Faraday.new(:url => 'http://data.githubarchive.org/') do |faraday|
faraday.request :url_encoded # form-encode POST params
faraday.response :logger # log requests to STDOUT
faraday.adapter Faraday.default_adapter # make requests with Net::HTTP
end
#query = '2015-01-01-15.json.gz' #this one works!!
query = '2015-01-01-{0..23}.json.gz' #this one doesn't work
encoded_query = URI.encode(query)
response = conn.get(encoded_query)
p response.body
To get a better idea of what's going wrong, let's start with the example given in the GitHub documentation:
The thing to note here is that
{0..23}
is automagically getting expanded by bash. You can see this by running the following command:This means
wget
doesn't get called just once, but instead gets called a total of 24 times. The problem you're having is that Ruby doesn't automagically expand{0..23}
like bash does, and instead you're making a literal call tohttp://data.githubarchive.org/2015-01-01-{0..23}.json.gz
, which doesn't exist.Instead you will need to loop through
0..23
yourself and make a single call every time:The GitHub Archive example for retrieving a range of files is:
The
{0..23}
part is being interpreted by wget itself as a range of 0 .. 23. You can test this by executing that command with the-v
flag which returns:In other words, wget is substituting values into the URL and then getting that new URL. This isn't obvious behavior, nor is it well documented, but you can find mention of it "out there". For instance in "All the Wget Commands You Should Know":
To do what you want, you need to iterate over the range in Ruby using something like this untested code: