I'm in the process of trying to scrape reddit (API-free) and I've run into a brick wall. On reddit, every page has a JSON representation that can be seen simply by appending .json
to the end, e.g. https://www.reddit.com/r/AskReddit.json
.
I installed NeatJS, and wrote a small chunk of code to clean the JSON up and print it:
require "rubygems"
require "json"
require "net/http"
require "uri"
require 'open-uri'
require 'neatjson'
url = ("https://www.reddit.com/r/AskReddit.json")
result = JSON.parse(open(url).read)
neatJS = JSON.neat_generate(result, wrap: 40, short: true, sorted: true, aligned: true, aroundColonN: 1)
puts neatJS
And it works fine:
(There's way more to that, it goes on for another few pages, the full JSON is here: http://pastebin.com/HDzFXqyU)
However, when I changed it to extract only the values I want:
url = ("https://www.reddit.com/r/AskReddit.json")
result = JSON.parse(open(url).read)
neatJS = JSON.neat_generate(result, wrap: 40, short: true, sorted: true, aligned: true, aroundColonN: 1)
neatJS.each do |data|
puts data["title"]
puts data["url"]
puts data["id"]
end
It gave me an error:
002----extractallaskredditthreads.rb:17:in `<main>': undefined method `each' for #<String:0x0055f948da9ae8> (NoMethodError)
I've been trying different variations of the extractor for about two days and none of them have worked. I feel like I'm missing something incredibly obvious. If anyone could point out what I'm doing wrong, that would be appreciated.
EDIT
It turns out I had the wrong variable name:
neatSJ =/= neatJS
However, correcting this only changes the error I got:
002----extractallaskredditthreads.rb:17:in `<main>': undefined method `each' for #<String:0x0055f948da9ae8> (NoMethodError)
And as I said, I have been attempting multiple ways of extracting the tags, which may have caused my typo.