I'm trying to scrape information from my companies Intranet so that I can display information on our office wall board via dashing dashboard. I'm trying to work with the provided information from:This Site.The problem that I'm having other than being a noob is that in order to gain access to the information I want to scrape, I need to login to our Intranet providing my username on one page then submitting to another so that I can provide my password. Once I'm logged in, I can then link and scrape my data.
Here is some source code from my login username page:
<form action='loginauthpwd.asp?PassedURL=' method='post' style='margin: 0px;'><table border='0' cellspacing='1' width='999' height='350'><tr><td width='100'> </td><td valign='center' width='100'><table style='width: 350px; background-color: #EEEEEE; border: 1px solid gray;'><tr><td class='fontBlack' style='padding: 10px; vertical-align: top;'><span style='font-weight: bold;'>Username:</span><br><input type='text' class='normal' autocomplete='off' id='LoginUser' name='LoginUser' style='border: 1px solid gray; height: 16px; font-family: arial; font-size: 11; width: 180px;' maxlength='30'><input class='normal_button' type='button' value='Go' style='border: 1px solid gray; font-weight: bold; width: 80px; margin-left: 10px;' onclick="var username=document.getElementById('LoginUser').value; if (username.length > 2) { submit(); } else { alert('Enter your Username.'); }"></form>
Here is some source from my login password page:
<form action='loginauthprocess.asp?UserName=******&Page=&PassedURL=' target='_top' method='post' onsubmit='checkMyBrowser();' style='margin: 0px;'><table border='0' cellspacing='1' width='999' height='350'><tr><td width='100'> </td><td valign='center' width='100'><table style='width: 350px; background-color: #EEEEEE; border: 1px solid gray;'><tr><td class='fontBlack' style='padding: 10px; vertical-align: top;'><span style='font-weight: bold;'>Password:</span><br><input class='normal' type='password' autocomplete='off' id='LoginPassword' name='LoginPassword' style='border: 1px solid gray; height: 16px; font-family: arial; font-size: 11; width: 180px;' maxlength='30'><input class='normal_button' type='submit' value='Log In' style='border: 1px solid gray; font-weight: bold; width: 80px; margin-left: 10px;' onclick="var password=document.getElementById('LoginPassword').value; if (password.length > 2) { submit(); } else { alert('Enter your Password.'); }"></form>
Using said resource's example this is what I think should work but doesn't seem to be:
require 'mechanize'
@agent = Mechanize.new
@agent.verify_mode = OpenSSL::SSL::VERIFY_NONE
##Login Page:
page = @agent.get 'http://www.website_here.com/intranet/login.asp'
##Username Page:
form = page.forms[0]
form['USER NAME HERE'] = LoginUser
##Submit User:
page = form.submit
##Password Page:
form = page.forms[0]
form['USER PASSWORD HERE'] = LoginPassword
##Submit Password:
page = form.submit
When I test my code I get the following output:
test.rb:10:in `': uninitialized constant LoginUser (NameError)
Can anyone point out what I'm doing wrong?
Thanks
EDIT 3/27/15:
Using @seoyoochan resource I tried to form my code like this:
require 'rubygems'
require 'mechanize'
login_page = agent.get "http://www.website_here.com/intranet/loginauthusr.asp?Page="
login_form = login_page.form_with(action: '/sessions')
user_field = login_form.field_with(name: "session[user]")
user.value = 'My User Name'
login_form.submit
When I try to run my code I'm now getting this output:
test.rb:4:in <main>': undefined local variable or method
agent' for main:Object (NameError)
I'm needing an example on how to assign the right names/classes that my provided form will work with.
EDIT 4/4/15:
Okay, Now using @tylermauthe example I'm trying to test the following code:
require 'mechanize'
require 'io/console'
agent = Mechanize.new
page = agent.get('http://www.website_here.com/intranet/loginauthusr.asp?Page=')
form = page.forms.find{|form| form.action.include?("loginauthpwd.asp?PassedURL=")}
puts "Login:"
form.login = gets.chomp
page = agent.submit(form)
pp page
Now my thoughts are that this code should allow me to enter and submit my username bringing me to my next page that would ask for my password. BUT, when I try to run it and enter my username, I get the following output:
/var/lib/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize/form.rb:217:in method_missing': undefined method
loginUser=' for # (NoMethodError)
from scraper.rb:10:in `'
What am I missing or have entered wrong? Please refer to my first edit to see how my form is coded. Also to be clear I did not code the forms this way. I'm only trying to learn how to code and scrape data needed to display on my Dashing Dashboard project.
I just looked up about Mechanize gem and found a relevant solution. You must set a proper 'name' on input fields. Otherwise you can't accept values from them. Follow this article.
http://crabonature.pl/posts/23-automation-with-mechanize-and-ruby
I was able to get logged in with the following example. Thanks to everyone that helped me with all the resources and examples to learn from!
I found the following example from user:Senthess HERE. I'm still not 100% on what all the individual code is doing so if anyone would like to take the time and break it down, please do so. This will help myself and others to better understand.
Thanks!
Not sure if you found these, but Mechanize has fairly excellent docs: http://docs.seattlerb.org/mechanize/GUIDE_rdoc.html
From these, I played around in the irb REPL to create this simple scraper that logs into GitHub: https://gist.github.com/tylermauthe/781f68add24819e207c4