I want to use the Python Scrapy module to scrape all the URLs from my website and write the list to a file. I looked in the examples but didn't see any simple example to do this.
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to get the background from multiple images by
- Evil ctypes hack in python
- Correctly parse PDF paragraphs with Python
Here's the python program that worked for me:
Save this in a file called
spider.py
.You can then use a shell pipeline to post process this text:
This gives me a list of all the unique urls in my site.
something cleaner (and maybe more useful) would be using LinkExtractor