I created this script for a friend that cycles through a real estate website and snags email address for her (for promotion). The site offers them freely, but it's inconvenient to grab one at a time. The first script dumps each pages data into a txt file called webdump and the second extracts the email addresses from the first txt file. Save each of these in a separate .vbs file. If you want to test the script, you may want to change the following to a lower number (this is how many pages are processed):
Do while i < 1334
The first one errors a ways in and I'm not totally sure why and the second one pulls out a little more than just the email addresses and again, not totally sure why. I'm not a highly skilled vbs guy, but those issues aren't related to my question... Question at the bottom...
set ie = createobject("internetexplorer.application")
Set objShell = CreateObject("WScript.Shell")
Dim i
i = 0
Do while i < 1334
i = i + 1
ie.navigate "http://www.reoagents.net/search-3.php?category=1&firmname=&business=&address=&zip=&phone=&fax=&mobile=&im=&manager=&mail=&www=&reserved_1=&reserved_2=&reserved_3=&filterbyday=ANY&loc_one=&loc_two=&loc_three=&loc_four=&location_text=&page="&i
do until ie.readystate = 4 : wscript.sleep 10: loop
pageText = ie.document.body.innertext
set fso = createobject("scripting.filesystemobject")
set ts = fso.opentextfile("c:\webdump.txt",8,true)
ts.write pageText
ts.close
loop
Wscript.Echo "All site data copied!"
And the second piece:
Const ForReading = 1
Const ForWriting = 8
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.Pattern = "@"
Set objFSO = CreateObject("Scripting.FileSystemObject")
'Input file
Set objFileIn = objFSO.OpenTextFile("C:\webdump.txt", ForReading)
strOutputFile = "C:\cleanaddress.txt"
Do Until objFileIn.AtEndOfStream
strSearchString = objFileIn.ReadLine
Set colMatches = objRegEx.Execute(strSearchString)
If colMatches.Count > 0 Then
For Each strMatch in colMatches
' Output File
Set objFileOut = objFSO.OpenTextFile(strOutputFile, ForWriting, True)
IF InStr(strSearchString," ") = 0 THEN
objFileOut.writeline strSearchString
ELSE
objFileOut.writeline Left(strSearchString,InStr(strSearchString," ")-1)
END IF
objFileOut.Close
Set objFileOut = Nothing
Next
End If
Loop
objFileIn.Close
Wscript.Echo "Done!"
I'm able to cycle through the pages on that site easily because of the way the address is...last number of address is sequential, however, now I want to try it with this address:
which seems to be java based. When I click through each page, the address doesn't change. Is it possible to do something similar to what I've done on the other site in this case?
Although not complete, not optimal, not bugfree, this could help:
Explanation:
http
(s) addressii+1
index) withjavascript
...__doPostBack
call (the same as if one fulfill Jump to Page field and click theGO
button)not bugfree:
ii+1
th page, so fails on the last one.Here is true jedi approach :) uses only
XMLHttpRequests
, there aren't IE disadvantages or dependencies from it. Output window created dynamically viamshta
without temp files. Processing speed can be improved by implementing async requests or multiprocess environment. The only way to stop the script at the moment unfortunately iswscript.exe
process termination.