I'm a little new to Python and very new to Scrapy.
I've set up a spider to crawl and extract all the information I need. However, I need to pass a .txt file of URLs to the start_urls variable.
For exmaple:
class LinkChecker(BaseSpider):
name = 'linkchecker'
start_urls = [] #Here I want the list to start crawling a list of urls from a text file a pass via the command line.
I've done a little bit of research and keep coming up empty handed. I've seen this type of example (How to pass a user defined argument in scrapy spider), but I don't think that will work for a passing a text file.
you could simply read-in the .txt file:
if you end up with trailing newline characters, try:
Hope this helps
If your urls are line seperated
then this lines of code will give you the urls.
Run your spider with
-a
option like:Then read the file in the
__init__
method of the spider and definestart_urls
:Hope that helps.
This will be your code. It will pick up the urls from the .txt file if they are separated by lines, like, url1 url2 etc..
After this run the command -->
Lets say, your filename is 'file.txt', then, run the command -->