My experience with Scrapy is limited, and each time I use it, it's always through the terminal's commands. How can I get my form data (a url to be scraped) from my django template to communicate with scrapy to start doing scraping? So far, I've only thought of is to get the form's returned data from django's views and then try to reach into the spider.py in scrapy's directory to add the form data's url to the spider's start_urls. From there, I don't really know how to trigger the actual crawling since I'm used to doing it strictly through my terminal with commands like "scrapy crawl dmoz". Thanks.
tiny edit: Just discovered scrapyd... I think I may be headed in the right direction with this.
You've actually answered it with an edit. The best option would be to setup
scrapyd
service and make an API call toschedule.json
to trigger a scraping job to run.To make that API http call, you can either use
urllib2
/requests
, or use a wrapper aroundscrapyd
API -python-scrapyd-api
:If we put aside
scrapyd
and try to run the spider from the view, it will block the request until the twisted reactor would stop - therefore, it is not really an option.You can though, start using
celery
(in tandem withdjango_celery
) - define a task that would run your Scrapy spider and call the task from your django view. This way, you would put the task on the queue and would not have a user waiting for crawling to be finished.Also, take a look at the django-dynamic-scraper package: