Trouble extracting individual JSON values in Ruby

2019-09-03 04:15发布

问题:

I'm in the process of trying to scrape reddit (API-free) and I've run into a brick wall. On reddit, every page has a JSON representation that can be seen simply by appending .json to the end, e.g. https://www.reddit.com/r/AskReddit.json.

I installed NeatJS, and wrote a small chunk of code to clean the JSON up and print it:

require "rubygems"
require "json"
require "net/http"
require "uri"
require 'open-uri'
require 'neatjson'

url = ("https://www.reddit.com/r/AskReddit.json")

result = JSON.parse(open(url).read)

neatJS = JSON.neat_generate(result, wrap: 40, short: true, sorted: true, aligned: true, aroundColonN: 1)

puts neatJS

And it works fine:

(There's way more to that, it goes on for another few pages, the full JSON is here: http://pastebin.com/HDzFXqyU)

However, when I changed it to extract only the values I want:

url = ("https://www.reddit.com/r/AskReddit.json")

result = JSON.parse(open(url).read)

neatJS = JSON.neat_generate(result, wrap: 40, short: true, sorted: true, aligned: true, aroundColonN: 1)

neatJS.each do |data|
  puts data["title"]
  puts data["url"]
  puts data["id"]
end

It gave me an error:

  002----extractallaskredditthreads.rb:17:in `<main>': undefined method `each' for #<String:0x0055f948da9ae8> (NoMethodError)

I've been trying different variations of the extractor for about two days and none of them have worked. I feel like I'm missing something incredibly obvious. If anyone could point out what I'm doing wrong, that would be appreciated.

EDIT

It turns out I had the wrong variable name:

 neatSJ =/= neatJS

However, correcting this only changes the error I got:

 002----extractallaskredditthreads.rb:17:in `<main>': undefined method `each' for #<String:0x0055f948da9ae8> (NoMethodError)

And as I said, I have been attempting multiple ways of extracting the tags, which may have caused my typo.

回答1:

In this code:

result = JSON.parse(open(url).read)

neatJS = JSON.neat_generate(result, wrap: 40, short: true, sorted: true, aligned: true, aroundColonN: 1)

...result is a Ruby Hash object, the result of parsing the JSON into a Ruby object with JSON.parse. Meanwhile, neatJS is a String, the result of calling JSON.neat_generate on the result Hash. It doesn't make sense to call each on a string. If you want to access the values inside the JSON structure, you want to use the result object, not the neatJS string:

children = result["data"]["children"]

children.each do |child|
  puts child["data"]["title"]
  puts child["data"]["url"]
  puts child["data"]["id"]
end


回答2:

Is it a typo?

neatJS = JSON.neat_generate
[...]
neatSJ.each do |data|


标签: ruby json reddit