I am having a problem where none of my scrapy spiders will crawl a website, just scrape one page and seize. I was under the impression that the rules
member variable was responsible for this, but I can't get it to follow any links. I have been following the documentation from here: http://doc.scrapy.org/en/latest/topics/spiders.html#crawlspider
What could I be missing that is making none of my bots crawl?
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors import LinkExtractor
from scrapy.selector import Selector
from Example.items import ExItem
class ExampleSpider(CrawlSpider):
name = "example"
allowed_domains = ["example.ac.uk"]
start_urls = (
'http://www.example.ac.uk',
)
rules = ( Rule (LinkExtractor(allow=("", ),),
callback="parse_items", follow= True),
)
Replace your rule with this one :