Setting Scrapy start_urls from a Script

2019-05-25 04:35发布

I have a working scrapy spider and I'm able to run it through a separate script following the example here. I have also created a wxPython GUI for my script that simply contains a multi-line TextCtrl for users to input a list of URLs to scrape and a button to submit. Currently the start_urls are hardcoded into my spider - How can I pass the URLs entered in my TextCtrl to the start_urls array in my spider? Thanks in advance for the help!

标签： python python-2.7 wxpython web-scraping scrapy

2条回答

劫难

2楼-- · 2019-05-25 05:01

Just set start_urls on your Spider instance:

spider = FollowAllSpider(domain=domain)
spider.start_urls = ['http://google.com']

0人赞添加讨论(0) 举报

ゆ、 Hurt°

3楼-- · 2019-05-25 05:03

alecxe answer doesn't work for me. My solution works for Scrapy==1.0.3 :

from scrapy.crawler import CrawlerProcess
from tutorial.spiders.some_spider import SomeSpider

process = CrawlerProcess()

process.crawl(SomeSpider, start_urls=["http://www.example.com"])
process.start()

It might help someone in the future.

0人赞添加讨论(0) 举报

Setting Scrapy start_urls from a Script

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间