Trying to figure out how display the text and images I have scraped in my application/html.
Here is my app/scrape2.rb file
require 'nokogiri'
require 'open-uri'
url = "https://marketplace.asos.com/boutiques/independent-label"
doc = Nokogiri::HTML(open(url))
label = doc.css('#boutiqueList')
@label = label.css('#boutiqueList img').map { |l| p l.attr('src') }
@title = label.css("#boutiqueList .notranslate").map { |o| p o.text }
Here is the controller:
class PagesController < ApplicationController
def about
#used to change the routing to /about
end
def index
@label = label.css('#boutiqueList img').map { |l| p l.attr('src') }
@title = label.css("#boutiqueList .notranslate").map { |o| p o.text }
end
end
and finally the label.html.erb page:
<% @label.each do |image| %>
<%= image_tag image %>
<% end %>
do I need some other method, not storing the arrays properly?
Your controller needs to load the data itself, or somehow pull the data from scrape2.rb
. Controllers do not have access to other files unless specified (include, extend, etc).
require 'nokogiri'
require 'open-uri'
class PagesController < ApplicationController
def index
# Call these in your controller:
url = "https://marketplace.asos.com/boutiques/independent-label"
doc = Nokogiri::HTML(open(url))
label = doc.css('#boutiqueList')
@label = label.css('#boutiqueList img').map { |l| p l.attr('src') }
@title = label.css("#boutiqueList .notranslate").map { |o| p o.text }
end
end
You're not parsing the data correctly.
label = doc.css('#boutiqueList')
should be:
label = doc.at('#boutiqueList')
#boutiqueList
is an ID, of which only one can exist in a document at a time. css
returns a NodeSet, which is like an Array, but you really want to point to the Node itself, which is what at
would do. at
is equivalent to search('...').first
.
Then you use:
label.css('#boutiqueList img')
which is also wrong. label
is supposed to already point to the node containing #boutiqueList
, but then you want Nokogiri to look inside that node and find additional nodes with id="boutiqueList"
and that contain <img>
tags. But, again, because #boutiqueList
is an ID and it can't occur more than once in the document, Nokogiri can't find any nodes:
label.css('#boutiqueList img').size # => 0
whereas using label.css
correctly finds <img>
nodes:
label.css('img').size # => 48
Then you use map
to print out values, but map
is used to modify the contents of an Array as it iterates over it. p
will return the value it outputs, but it's bad form to rely on the returned value of p
in a map
. Instead you should map
to convert the values, then puts
the result if you need to see it:
@label = label.css('#boutiqueList img').map { |l| l.attr('src') }
puts @label
Instead of using attr('src')
, I'd write the first line as:
@label = label.css('img').map { |l| l['src'] }
The same is true of:
@title = label.css("#boutiqueList .notranslate").map { |o| p o.text }