Scrapy: a simple way to get around a tiny javascri

2019-07-14 01:39发布

EDIT: I've asked a very-related question here.

I'm scraping data from a website using Scrapy which seemed like it was pretty straightforward, except that the page I eventually wanted came back looking like this:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
  <script type="text/javascript">
    function get_tz()
    {
      var now = new Date()
      document.tz_form.offset.value = now.getTimezoneOffset()
      document.tz_form.submit()
    }
  </script>
  <object><noscript>
    <p>&nbsp;&nbsp;&nbsp;&nbsp;  Javascript support is needed for this page (to get your local timezone).<br />
    &nbsp;&nbsp;&nbsp;&nbsp;  Your browser either has no Javascript support,
 or has such support disabled.<br />
    &nbsp;&nbsp;&nbsp;&nbsp;  As an alternative, click <a href="http://www.bridgebase.com/myhands/hands.php?offset=0">here</a> to continue.<br />
  All times will be GMT.</p>  </noscript></object>
</head>
  <body onload='get_tz()'>
   <form name="tz_form" action="/myhands/hands.php?traveller=5043-1453474920-72755316" method="post">
      <input type='hidden' name='offset' />
    </form>
  </body>
</html>

I'm pretty new to web scraping and I've read some stuff about how to get around this using splash or selenium, but I wanted to ask if there was an easy way to get around this before diving deeper. All I need to do is supply this bit of time zone information. I'm not sure if it's as easy as it looks...

My spider is a little verbose because of authentication, but I can supply that if people think it helps. I figured it wouldn't be too critical here.

0条回答
登录 后发表回答