Making multiple API calls in parallel using Python

2020-05-24 05:18发布

问题:

I am working with Python (IPython & Canopy) and a RESTful content API, on my local machine (Mac).

I have an array of 3000 unique IDs to pull data for from the API and can only call the API with one ID at a time.

I was hoping somehow to make 3 sets of 1000 calls in parallel to speed things up.

What is the best way of doing this?

Thanks in advance for any help!

回答1:

Without more information about what you are doing in particular, it is hard to say for sure, but a simple threaded approach may make sense.

Assuming you have a simple function that processes a single ID:

import requests

url_t = "http://localhost:8000/records/%i"

def process_id(id):
    """process a single ID"""
    # fetch the data
    r = requests.get(url_t % id)
    # parse the JSON reply
    data = r.json()
    # and update some data with PUT
    requests.put(url_t % id, data=data)
    return data

You can expand that into a simple function that processes a range of IDs:

def process_range(id_range, store=None):
    """process a number of ids, storing the results in a dict"""
    if store is None:
        store = {}
    for id in id_range:
        store[id] = process_id(id)
    return store

and finally, you can fairly easily map sub-ranges onto threads to allow some number of requests to be concurrent:

from threading import Thread

def threaded_process_range(nthreads, id_range):
    """process the id range in a specified number of threads"""
    store = {}
    threads = []
    # create the threads
    for i in range(nthreads):
        ids = id_range[i::nthreads]
        t = Thread(target=process_range, args=(ids,store))
        threads.append(t)

    # start the threads
    [ t.start() for t in threads ]
    # wait for the threads to finish
    [ t.join() for t in threads ]
    return store

A full example in an IPython Notebook: http://nbviewer.ipython.org/5732094

If your individual tasks take a more widely varied amount of time, you may want to use a ThreadPool, which will assign jobs one at a time (often slower if individual tasks are very small, but guarantees better balance in heterogenous cases).