URL encoding issues with curly braces

I'm having issues getting data from GitHub Archive.

The main issue is my problem with encoding {} and .. in my URL. Maybe I am misreading the Github API or not understanding encoding correctly.

require 'open-uri'
require 'faraday'

conn = Faraday.new(:url => 'http://data.githubarchive.org/') do |faraday|
  faraday.request  :url_encoded             # form-encode POST params
  faraday.response :logger                  # log requests to STDOUT
  faraday.adapter  Faraday.default_adapter  # make requests with Net::HTTP
end

#query = '2015-01-01-15.json.gz' #this one works!!
query = '2015-01-01-{0..23}.json.gz' #this one doesn't work
encoded_query = URI.encode(query)

response = conn.get(encoded_query)
p response.body

标签： ruby github-api url-encoding faraday

2条回答

相关推荐>>

2楼-- · 2019-06-08 16:01

To get a better idea of what's going wrong, let's start with the example given in the GitHub documentation:

wget http://data.githubarchive.org/2015-01-01-{0..23}.json.gz

The thing to note here is that {0..23} is automagically getting expanded by bash. You can see this by running the following command:

echo {0..23}
> 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

This means wget doesn't get called just once, but instead gets called a total of 24 times. The problem you're having is that Ruby doesn't automagically expand {0..23} like bash does, and instead you're making a literal call to http://data.githubarchive.org/2015-01-01-{0..23}.json.gz, which doesn't exist.

Instead you will need to loop through 0..23 yourself and make a single call every time:

(0..23).each do |n|
  query = "2015-01-01-#{n}.json.gz"
  encoded_query = URI.encode(query)
  response = conn.get(encoded_query)
  p response.body
end

0人赞添加讨论(0) 举报

\"骚年 ilove

3楼-- · 2019-06-08 16:08

The GitHub Archive example for retrieving a range of files is:

wget http://data.githubarchive.org/2015-01-01-{0..23}.json.gz

The {0..23} part is being interpreted by wget itself as a range of 0 .. 23. You can test this by executing that command with the -v flag which returns:

wget -v http://data.githubarchive.org/2015-01-01-{0..1}.json.gz
--2015-06-11 13:31:07--  http://data.githubarchive.org/2015-01-01-0.json.gz
Resolving data.githubarchive.org... 74.125.25.128, 2607:f8b0:400e:c03::80
Connecting to data.githubarchive.org|74.125.25.128|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2615399 (2.5M) [application/x-gzip]
Saving to: '2015-01-01-0.json.gz'

2015-01-01-0.json.gz                                        100%[===========================================================================================================================================>]   2.49M  3.03MB/s   in 0.8s

2015-06-11 13:31:09 (3.03 MB/s) - '2015-01-01-0.json.gz' saved [2615399/2615399]

--2015-06-11 13:31:09--  http://data.githubarchive.org/2015-01-01-1.json.gz
Reusing existing connection to data.githubarchive.org:80.
HTTP request sent, awaiting response... 200 OK
Length: 2535599 (2.4M) [application/x-gzip]
Saving to: '2015-01-01-1.json.gz'

2015-01-01-1.json.gz                                        100%[===========================================================================================================================================>]   2.42M   867KB/s   in 2.9s

2015-06-11 13:31:11 (867 KB/s) - '2015-01-01-1.json.gz' saved [2535599/2535599]

FINISHED --2015-06-11 13:31:11--
Total wall clock time: 4.3s
Downloaded: 2 files, 4.9M in 3.7s (1.33 MB/s)

In other words, wget is substituting values into the URL and then getting that new URL. This isn't obvious behavior, nor is it well documented, but you can find mention of it "out there". For instance in "All the Wget Commands You Should Know":

7. Download a list of sequentially numbered files from a server
wget http://example.com/images/{1..20}.jpg

To do what you want, you need to iterate over the range in Ruby using something like this untested code:

0.upto(23) do |i|
  response = conn.get("/2015-01-01-#{ i }.json.gz")
  p response.body
end

0人赞添加讨论(0) 举报

URL encoding issues with curly braces

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间