Nokogiri displaying data in view

2019-07-24 14:07发布

问题:

Trying to figure out how display the text and images I have scraped in my application/html. Here is my app/scrape2.rb file

require 'nokogiri'
require 'open-uri'

url = "https://marketplace.asos.com/boutiques/independent-label"

doc = Nokogiri::HTML(open(url))

label = doc.css('#boutiqueList')

@label = label.css('#boutiqueList img').map { |l| p l.attr('src') }
@title = label.css("#boutiqueList .notranslate").map { |o| p o.text }

Here is the controller:

class PagesController < ApplicationController
    def about
        #used to change the routing to /about
    end

      def index
         @label = label.css('#boutiqueList img').map { |l| p l.attr('src') }
         @title = label.css("#boutiqueList .notranslate").map { |o| p o.text }
    end

end

and finally the label.html.erb page:

<% @label.each do |image| %>
<%= image_tag image %>
<% end %>

do I need some other method, not storing the arrays properly?

回答1:

Your controller needs to load the data itself, or somehow pull the data from scrape2.rb. Controllers do not have access to other files unless specified (include, extend, etc).

require 'nokogiri'
require 'open-uri'

class PagesController < ApplicationController

  def index 

     # Call these in your controller:
     url = "https://marketplace.asos.com/boutiques/independent-label"
     doc = Nokogiri::HTML(open(url))
     label = doc.css('#boutiqueList')

     @label = label.css('#boutiqueList img').map { |l| p l.attr('src') }
     @title = label.css("#boutiqueList .notranslate").map { |o| p o.text }
  end
end


回答2:

You're not parsing the data correctly.

label = doc.css('#boutiqueList')

should be:

label = doc.at('#boutiqueList')

#boutiqueList is an ID, of which only one can exist in a document at a time. css returns a NodeSet, which is like an Array, but you really want to point to the Node itself, which is what at would do. at is equivalent to search('...').first.

Then you use:

label.css('#boutiqueList img')

which is also wrong. label is supposed to already point to the node containing #boutiqueList, but then you want Nokogiri to look inside that node and find additional nodes with id="boutiqueList" and that contain <img> tags. But, again, because #boutiqueList is an ID and it can't occur more than once in the document, Nokogiri can't find any nodes:

label.css('#boutiqueList img').size # => 0

whereas using label.css correctly finds <img> nodes:

label.css('img').size # => 48

Then you use map to print out values, but map is used to modify the contents of an Array as it iterates over it. p will return the value it outputs, but it's bad form to rely on the returned value of p in a map. Instead you should map to convert the values, then puts the result if you need to see it:

 @label = label.css('#boutiqueList img').map { |l| l.attr('src') }
 puts @label

Instead of using attr('src'), I'd write the first line as:

 @label = label.css('img').map { |l| l['src'] }

The same is true of:

@title = label.css("#boutiqueList .notranslate").map { |o| p o.text }