Nokogiri price doesn't show

2019-08-20 08:27发布

问题:

Can anyone explain how can I retrieve the value of the price using nokogiri? The values that appear on the scraper that I created are the this

Costa Rica
<span class="ProductListElement__price"> </span>
India
<span class="ProductListElement__price"> </span>
Indonesia
<span class="ProductListElement__price"> </span>
Colombia
<span class="ProductListElement__price"> </span>
Nicaragua
<span class="ProductListElement__price"> </span>
Ethiopia
<span class="ProductListElement__price"> </span>
Master Origin Pack (50 cápsulas)
<span class="ProductListElement__price"> </span>
CAFÉ İSTANBUL
<span class="ProductListElement__price"> </span>
Envivo Lungo
<span class="ProductListElement__price"> </span>
Fortissio Lungo
<span class="ProductListElement__price"> </span>
Vivalto Lungo
<span class="ProductListElement__price"> </span>
Linizio Lungo
<span class="ProductListElement__price"> </span>
Livanto
<span class="ProductListElement__price"> </span>
Capriccio
<span class="ProductListElement__price"> </span>
Volluto
<span class="ProductListElement__price"> </span>
Cosi
<span class="ProductListElement__price"> </span>
Kazaar
<span class="ProductListElement__price"> </span>
Dharkan
<span class="ProductListElement__price"> </span>
Ristretto
<span class="ProductListElement__price"> </span>
Arpeggio
<span class="ProductListElement__price"> </span>
Roma
<span class="ProductListElement__price"> </span>
Ristretto Decaffeinato
<span class="ProductListElement__price"> </span>
Arpeggio Decaffeinato
<span class="ProductListElement__price"> </span>
Volluto Decaffeinato
<span class="ProductListElement__price"> </span>
Vivalto Lungo Decaffeinato
<span class="ProductListElement__price"> </span>
Vanilio
<span class="ProductListElement__price"> </span>
Caramelito
<span class="ProductListElement__price"> </span>

My controller is this:

class CupsController < ApplicationController

    class Entry
      def initialize(name, price)
        @name = name
        @price = price

      end
      attr_reader :name
      attr_reader :price

    end

    def cups
        require 'open-uri'
        require 'nokogiri'

        doc = Nokogiri::HTML(open('https://www.nespresso.com/pt/pt/order/capsules'))
        entries = doc.css("article.ProductListElement")
        @entriesArray = []
        entries.each do |entry|

            name = entry.css('.ProductListElement__name').text
            price = entry.css('span.ProductListElement__price')


          @entriesArray << Entry.new(name, price)
          @name = name
          @price = price

        end
        render template: 'cups/home'
    end
end

回答1:

This is what I'm receiving for each result:

<span class="ProductListElement__price"> </span>

which suggests to me that the prices are dynamically loaded by JavaScript once the webpage has loaded.

To be able to scrape dynamically-loaded data, you will need to use a library like Watir which is supported by Rails 5.

With Watir, you are able to wait until all scripts are executed and all data is loaded before attempting to scrape the site.