I was just wondering why this might be occurring. Here is my Python script to run all:
from scrapy import cmdline
file = open('cityNames.txt', 'r')
cityNames = file.read().splitlines()
for city in cityNames:
url = "http://" + city + ".website.com"
output = city + ".json"
cmdline.execute(['scrapy', 'crawl', 'backpage_tester', '-a', "start_url="+url, '-o', ""+output])
cityNames.txt:
chicago
sanfran
boston
It runs the through the first city fine, but then stops after that. It doesn't run sanfran or boston - only chicago. Any thoughts? Thank you!
Your method is using synchronous calls. You should use asynchronous calls in Python (asyncio?) or use a bash script that iterates over a text file of your urls:
this should issue one scrapy process per url. However, be warned-- this could easily overload your system if those crawls are extensive and deep on each site, and your spiders are not properly configured.