EDIT: I've asked a very-related question here.
I'm scraping data from a website using Scrapy which seemed like it was pretty straightforward, except that the page I eventually wanted came back looking like this:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<script type="text/javascript">
function get_tz()
{
var now = new Date()
document.tz_form.offset.value = now.getTimezoneOffset()
document.tz_form.submit()
}
</script>
<object><noscript>
<p> Javascript support is needed for this page (to get your local timezone).<br />
Your browser either has no Javascript support,
or has such support disabled.<br />
As an alternative, click <a href="http://www.bridgebase.com/myhands/hands.php?offset=0">here</a> to continue.<br />
All times will be GMT.</p> </noscript></object>
</head>
<body onload='get_tz()'>
<form name="tz_form" action="/myhands/hands.php?traveller=5043-1453474920-72755316" method="post">
<input type='hidden' name='offset' />
</form>
</body>
</html>
I'm pretty new to web scraping and I've read some stuff about how to get around this using splash or selenium, but I wanted to ask if there was an easy way to get around this before diving deeper. All I need to do is supply this bit of time zone information. I'm not sure if it's as easy as it looks...
My spider is a little verbose because of authentication, but I can supply that if people think it helps. I figured it wouldn't be too critical here.