This has been asked before but the answer that always comes up is to use DjangoItem. However it states on it's github that:
often not a good choice for a write intensive applications (such as a web crawler) ... may not scale well
This is the crux of my problem, I'd like to use and interact with my django model in the same way I can when I run python manage.py shell and I do from myapp.models import Model1. Using queries like seen here.
I have tried relative imports and moving my whole scrapy project inside my django app, both to no avail.
Where should I move my scrapy project to for this to work? How can I recreate / use all the methods that are available to me in the shell inside a scrapy pipeline?
Thanks in advance.
In here i have create a sample project which uses scrapy inside django. And uses Django models and ORM in the one of the pipelines.
https://github.com/bipul21/scrapy_django
The directory structure starts with your django project.
In this case the the project name is django_project.
Once inside the base project you create your scrapy project i.e. scrapy_project here
In your scrapy project settings add the following line to setup initialize django
import os
import sys
import django
sys.path.append(os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), ".."))
os.environ['DJANGO_SETTINGS_MODULE'] = 'django_project.settings'
django.setup()
In the pipeline i have made a simple query to Question Model
from questions.models import Questions
class ScrapyProjectPipeline(object):
def process_item(self, item, spider):
try:
question = Questions.objects.get(identifier=item["identifier"])
print "Question already exist"
return item
except Questions.DoesNotExist:
pass
question = Questions()
question.identifier = item["identifier"]
question.title = item["title"]
question.url = item["url"]
question.save()
return item
You can check in the project for any further details like model schema.