Python requests module multithreading

2020-03-08 06:38发布

问题:

Is there a possible way to speed up my code using multiprocessing interface? The problem is that this interface uses map function, which works only with 1 function. But my code has 3 functions. I tried to combine my functions into one, but didn't get success. My script reads the URL of site from file and performs 3 functions over it. For Loop makes it very slow, because I got a lot of URLs

import requests

def Login(url): #Log in     
    payload = {
        'UserName_Text'     : 'user',
        'UserPW_Password'   : 'pass',
        'submit_ButtonOK'   : 'return buttonClick;'  
      }

    try:
        p = session.post(url+'/login.jsp', data = payload, timeout=10)
    except (requests.exceptions.ConnectionError, requests.exceptions.Timeout):
        print "site is DOWN! :", url[8:]
        session.cookies.clear()
        session.close() 
    else:
        print 'OK: ', p.url

def Timer(url): #Measure request time
    try:
        timer = requests.get(url+'/login.jsp').elapsed.total_seconds()
    except (requests.exceptions.ConnectionError):
        print 'Request time: None'
        print '-----------------------------------------------------------------'
    else: 
        print 'Request time:', round(timer, 2), 'sec'

def Logout(url): # Log out
    try:
        logout = requests.get(url+'/logout.jsp', params={'submit_ButtonOK' : 'true'}, cookies = session.cookies)
    except(requests.exceptions.ConnectionError):
        pass
    else:
        print 'Logout '#, logout.url
        print '-----------------------------------------------------------------'
        session.cookies.clear()
        session.close()
for line in open('text.txt').read().splitlines():
    session = requests.session()
    Login(line)
    Timer(line)
    Logout(line)

回答1:

Yes, you can use multiprocessing.

from multiprocessing import Pool

def f(line):
    session = requests.session()
    Login(session, line)
    Timer(session, line)
    Logout(session, line)        

if __name__ == '__main__':
    urls = open('text.txt').read().splitlines()
    p = Pool(5)
    print(p.map(f, urls))

The requests session cannot be global and shared between workers, every worker should use its own session.

You write that you already "tried to combine my functions into one, but didn't get success". What exactly didn't work?



回答2:

There are many ways to accomplish your task, but multiprocessing is not needed at that level, it will just add complexity, imho.

Take a look at gevent, greenlets and monkey patching, instead!

Once your code is ready, you can wrap a main function into a gevent loop, and if you applied the monkey patches, the gevent framework will run N jobs concurrently (you can create a jobs pool, set the limits of concurrency, etc.)

This example should help:

#!/usr/bin/python
# Copyright (c) 2009 Denis Bilenko. See LICENSE for details.

"""Spawn multiple workers and wait for them to complete"""
from __future__ import print_function
import sys

urls = ['http://www.google.com', 'http://www.yandex.ru', 'http://www.python.org']

import gevent
from gevent import monkey

# patches stdlib (including socket and ssl modules) to cooperate with other greenlets
monkey.patch_all()


if sys.version_info[0] == 3:
    from urllib.request import urlopen
else:
    from urllib2 import urlopen


def print_head(url):
    print('Starting %s' % url)
    data = urlopen(url).read()
    print('%s: %s bytes: %r' % (url, len(data), data[:50]))

jobs = [gevent.spawn(print_head, url) for url in urls]

gevent.wait(jobs)

You can find more here and in the Github repository, from where this example comes from

P.S. Greenlets will works with requests as well, you don't need to change your code.