Using selenium-webdriver for parsing (Ruby)

2019-08-20 17:09发布

Previously I used mechanize for parsing, but now I'm parsing website that uses javscript and mechanize doesn't support it, so I took selenium. I have to take information about companies from this website but I can get the information only after click on javascript link. I did it with selenium, my parser clicks on javascript, then collects information and here appear problems. As you understand I need to save collected information to the database and I can do this properly only if information will be stored in the variables (e.g. address=.., phone=.., email=.., etc). I select necessary information with SelectorGadget and selenium collects information (driver.find_element(:css, ..), but the information about all the companies is located in a single selector (.p2 div) and I can not save the location as a single variable, the phone in the other variable, etc. So my question - is it possible to divide this text and save in the variables?

Photos that illustrate the process:

i.imgur.com/J5dcGZD.png

i.imgur.com/MaBWICZ.png

i.imgur.com/ZDNXhLt.png

Photo with part of html: http://i.imgur.com/NUa1X97.png

Here is an example page of this site. The site is in Russian so translate it through Google translator

The parser itself (save a bunch of text from each company to the contacts variable):

require 'rubygems'
require 'selenium-webdriver'

driver = Selenium::WebDriver.for :firefox
driver.get "http://www.ypag.ru/cat/komp249/page3880.html"


loop do
driver.find_elements(:css, ".p2 div a").each {|link| link.click}
driver.find_elements(:css, ".p3 a, .firm , .p2 div").each {
|n,r,c|
name = n
region = r
contacts = c

print name.text.center(100)
puts region
puts contacts

}
link = driver.find_element(:xpath, "/html/body/table[5]/tbody/tr/td/a[2]" )[:href]
break if link == "http://www.ypag.ru/cat/komp249/page3780.html"
driver.get "#{link}"
end

0条回答
登录 后发表回答