I'm trying to scrape International Maritime Organization's data (https://gisis.imo.org/Public/PAR/Search.aspx) on shipping vessel attacks between the dates ("is between" in the site's search engine) 2002-01-01, 2005-12-31.
I've used bs4 and requests modules in python previously to scrape financial data from yahoo, and weather data from wunderground, but this site requires a login and password (under the "public" account type). Furthermore, as I said the data requires a search / filter before I can access the html on the page:
Once I click on a row here, it expands to the image below. (Before anyone asks why I don't just download the dataset and pull from there: the DL is for some reason filtered, and not all the columns are given out (for example, the IMO number).
ULTIMATELY THE DATA I AM TRYING TO PULL IS FROM THIS PAGE, and I need (item, css path):
position of incident
#ctl00_bodyPlaceHolder_ctl00_pnlDetail > table:nth-child(4) > tbody > tr:nth-child(1) > td:nth-child(2) > span
date
#ctl00_bodyPlaceHolder_ctl00_pnlDetail > table:nth-child(4) > tbody > tr:nth-child(6) > td.content > span
ship name
#ctl00_bodyPlaceHolder_ctl00_pnlDetail > table:nth-child(4) > tbody > tr:nth-child(4) > td:nth-child(2) > span
Needless to say this seems like a daunting task. Any recommendations?
Here is the OLD code I've been using to scrape the weather data (haven't changed anything yet because I don't know where to start in terms of the login/filter process: http://pythonfiddle.com/get-wx-data