I scrap successfully data for a single account. I want to scrap multiple accounts on a single website, multiple accounts needs multiple logins, I want a way how to manage login/logout ?
问题:
回答1:
you can scrape multiples accounts in parallel using multiple cookiejars per account session, see "cookiejar" request meta key at http://doc.scrapy.org/en/latest/topics/downloader-middleware.html?highlight=cookiejar#std:reqmeta-cookiejar
To clarify:
suppose we have an array of accounts in settings.py
:
MY_ACCOUNTS = [
{'login': 'my_login_1', 'pwd': 'my_pwd_1'},
{'login': 'my_login_2', 'pwd': 'my_pwd_2'},
]
And this is the link to the login page: http://example.com/login
Create start_requests
function in your spider, in this function we can loop on the MY_ACCOUNTS
array and login to each account:
def start_requests(self):
requests = []
for i, account in enumerate(self.crawler.settings['MY_ACCOUNTS']):
request = FormRequest('http://example.com/login',
formdata={'form_login_name': account['login'], 'form_pwd_name': account['pwd']},
callback=self.parse,
dont_filter=True)
request.meta['cookiejar'] = i
requests.append(request)
return requests
form_login_name
and form_pwd_name
are respectively fields names on the login form.
dont_filter=True
For ignoring filter on duplicate requests, because here we make a POST request to login on the same page http://example.com/login
request.meta['cookiejar'] = i
to separate cookies of each session(login), dont forget to add cookiejar
identifier in your sub request, suppose you want to redirect scrapy to a page after login:
def parse(self, response):
""" make some manipulation here ... """
yield Request(my_url, meta={'cookiejar': response.meta['cookiejar']}, callback = my_callback)