I'm using the following code to generate a JSON file containing all category information for a particular website.
require 'mechanize'
@hashes = []
@categories_hash = {}
@categories_hash['category'] ||= {}
@categories_hash['category']['id'] ||= {}
@categories_hash['category']['name'] ||= {}
@categories_hash['category']['group'] ||= {}
# Initialize Mechanize object
a = Mechanize.new
# Begin scraping
a.get('http://www.marktplaats.nl/') do |page|
groups = page.search('//*[(@id = "navigation-categories")]//a')
groups.each_with_index do |group, index_1|
a.get(group[:href]) do |page_2|
categories = page_2.search('//*[(@id = "category-browser")]//a')
categories.each_with_index do |category, index_2|
@categories_hash['category']['id'] = "#{index_1}_#{index_2}"
@categories_hash['category']['name'] = category.text
@categories_hash['category']['group'] = group.text
@hashes << @categories_hash['category']
# Uncomment if you want to see what's being written
puts @categories_hash['category'].to_json
end
end
end
end
File.open("json/magic/#{Time.now.strftime '%Y%m%d%H%M%S'}_magic_categories.json", 'w') do |f|
puts '# Writing category data to JSON file'
f.write(@hashes.to_json)
puts "|-----------> Done. #{@hashes.length} written."
end
puts '# Finished.'
But this code returns a JSON file filled with just the last category data. For the full JSON file take a look here. This is a sample:
[
{
"id":"36_17",
"name":"Overige Diversen",
"group":"Diversen"
},
{
"id":"36_17",
"name":"Overige Diversen",
"group":"Diversen"
},
{
"id":"36_17",
"name":"Overige Diversen",
"group":"Diversen"
}, {...}
]
The question is, what's causing this and how can I solve it?