Extracting elements with Nokogiri

2019-05-21 11:33发布

问题:

Was wondering if someone could help out with the following. I am using Nokogiri to scrape some data from http://www.bbc.co.uk/sport/football/tables

I would like to get the league table info, so far ive got this

def get_league_table # Get me Premier League Table
  doc = Nokogiri::HTML(open(FIXTURE_URL))
  table = doc.css('.table-stats')
  teams = table.xpath('following-sibling::*[1]').css('tr.team')
  teams.each do |team|
  position = team.css('.position-number').text.strip
  League.create!(position: position)
  end
end

So i thought i would grab the .table-stats and then get each row in the table with a class of team, these rows contain all the info I need, like position number, played,team-name etc.

Once I'm in the tr.team I thought I could do a loop to grab the relevant info from the rows.

Its the xpath part I am stuck on (unless I'm approaching the whole thing wrong?), how to get to the tr.team class from .table-stats?

Could anyone offer any pointers please?

Thanks

回答1:

This is a script I made to dynamically parse tables, I adapted it to your case:

require 'open-uri'
require 'nokogiri'

url = 'http://www.bbc.co.uk/sport/football/tables'
doc = Nokogiri::HTML.parse(open url)
teams = doc.search('tbody tr.team')

keys = teams.first.search('td').map do |k|
  k['class'].gsub('-', '_').to_sym
end

hsh = teams.flat_map do |team|
  Hash[keys.zip(team.search('td').map(&:text))]
end

puts hsh