Hello I have installed Scrapyjs + Splash and I use the following code
import json
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spider import Spider
from scrapy.selector import Selector
import urlparse, random
class DmozSpider(scrapy.Spider):
name = "dmoz"
allowed_domains = ["whoscored.com"]
start_urls = ['http://www.whoscored.com/Regions/81/Tournaments/3/Seasons/4336/Stages/9192/Fixtures/Germany-Bundesliga-2014-2015']
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(url, self.parse, meta={
'splash': {
'endpoint': 'render.html',
'args': {'wait': 0.5}
}
})
def parse(self, response):
cnt = 0
with open('links2.txt', 'a') as f:
while True:
try:
data = ''.join(Selector(text=response.body).xpath('//a[@class="match-link match-report rc"]/@href')[cnt].extract())
data = "https://www.whoscored.com"+data
except:
break
f.write(data+'\n')
cnt += 1
So far it works fine but now I would like to click the 'previous' button in a controller which doesn't have an id nor a real href.
I have tried the
splash:runjs("$('#date-controller').click()")
and the
splash:runjs("window.location = document.getElementsByTagName('a')[64].href")
but both without success.
Here's a basic (yet working) example of how to pass JavaScript code in a lua script for Splash, using the
/execute
endpointSample logs: