Previously I used mechanize for parsing, but now I'm parsing website that uses javscript and mechanize doesn't support it, so I took selenium. I have to take information about companies from this website but I can get the information only after click on javascript link. I did it with selenium, my parser clicks on javascript, then collects information and here appear problems. As you understand I need to save collected information to the database and I can do this properly only if information will be stored in the variables (e.g. address=.., phone=.., email=.., etc). I select necessary information with SelectorGadget and selenium collects information (driver.find_element(:css, ..)
, but the information about all the companies is located in a single selector (.p2 div
)
and I can not save the location as a single variable, the phone in the other variable, etc. So my question - is it possible to divide this text and save in the variables?
Photos that illustrate the process:
i.imgur.com/J5dcGZD.png
i.imgur.com/MaBWICZ.png
i.imgur.com/ZDNXhLt.png
Photo with part of html: http://i.imgur.com/NUa1X97.png
Here is an example page of this site. The site is in Russian so translate it through Google translator
The parser itself (save a bunch of text from each company to the contacts variable):
require 'rubygems'
require 'selenium-webdriver'
driver = Selenium::WebDriver.for :firefox
driver.get "http://www.ypag.ru/cat/komp249/page3880.html"
loop do
driver.find_elements(:css, ".p2 div a").each {|link| link.click}
driver.find_elements(:css, ".p3 a, .firm , .p2 div").each {
|n,r,c|
name = n
region = r
contacts = c
print name.text.center(100)
puts region
puts contacts
}
link = driver.find_element(:xpath, "/html/body/table[5]/tbody/tr/td/a[2]" )[:href]
break if link == "http://www.ypag.ru/cat/komp249/page3780.html"
driver.get "#{link}"
end