How can I combine/speed up multiple API calls to i

2019-08-06 10:14发布

问题:

Update: I found something that might be useful, but I'm still having a bit of trouble figuring out how to implement it. If I try to map the get_data like so, I'm not sure how I can assign the results of each call to the respective variable.

parameters = [
    [service, profile_id, '30daysAgo', 'ga:browser', 'sessions::condition::ga:deviceCategory==desktop'],
    [service, profile_id, '60daysAgo', 'ga:browser', 'sessions::condition::ga:deviceCategory==desktop'],
    ...
    [service, profile_id, '90daysAgo', 'ga:browser,ga:browserVersion', 'sessions::condition::ga:deviceCategory==mobile']
]

with ThreadPoolExecutor(max_workers=4) as executor:
    executor.map(get_data, parameters)

I'm writing a Python application (using the Google analytics API) that allows a user to get a report of the top 10 desktop browsers, desktop browsers broken down by version, mobile browsers, and mobile OS's used to access a given site over the last 30, 60, and 90 days. As of right now, everything seems to be working fine.

However, performance is all over the place. There are 12 API requests being made - 3 for each of the 4 sets of data. Sometimes the application takes about 10 seconds to run, and sometimes it takes well over a minute. It seems like it's all dependent on how the API is responding. So my question is: are there ways I can either combine some of these requests or arrange them in such a way that they'll get executed concurrently?

I tried looking into ways to consolidate the requests so that maybe I'd only have to do one request per set of data that would return information for 30, 60, and 90 days, but I wasn't able to come across anything. As for getting the requests to concurrently, I'm just not quite sure how to go about doing something like that. The closest thing I could find was this question/answer, but I couldn't quite follow the answer about batch processing.

Here's the relevant code:

def get_data(service, profile_id, days, dimensions, segment):
    return service.data().ga().get(
        ids='ga:' + profile_id,
        start_date=days,
        end_date='today',
        metrics='ga:sessions',
        dimensions=dimensions,
        sort='-ga:sessions',
        segment=segment,
        max_results=10).execute()


def get_results(service, profile_id):
    global glob_startdate
    global glob_months

    # get top 10 desktop browsers
    print("Getting top 10 desktop browsers...")
    data_1a = get_data(service, profile_id, '30daysAgo', 'ga:browser', 'sessions::condition::ga:deviceCategory==desktop')
    data_1b = get_data(service, profile_id, '60daysAgo', 'ga:browser', 'sessions::condition::ga:deviceCategory==desktop')
    data_1c = get_data(service, profile_id, '90daysAgo', 'ga:browser', 'sessions::condition::ga:deviceCategory==desktop')
    data1 = [data_1a, data_1b, data_1c]

    # get top 10 desktop browser versions
    print("Getting top 10 desktop browser versions...")
    data_2a = get_data(service, profile_id, '30daysAgo', 'ga:browser,ga:browserVersion', 'sessions::condition::ga:deviceCategory==desktop')
    data_2b = get_data(service, profile_id, '60daysAgo', 'ga:browser,ga:browserVersion', 'sessions::condition::ga:deviceCategory==desktop')
    data_2c = get_data(service, profile_id, '90daysAgo', 'ga:browser,ga:browserVersion', 'sessions::condition::ga:deviceCategory==desktop')
    data2 = [data_2a, data_2b, data_2c]

    # get top 10 mobile OS's
    print("Getting top 10 mobile OS's...")
    data_3a = get_data(service, profile_id, '30daysAgo', 'ga:operatingSystem,ga:operatingSystemVersion', 'sessions::condition::ga:deviceCategory==mobile')
    data_3b = get_data(service, profile_id, '60daysAgo', 'ga:operatingSystem,ga:operatingSystemVersion', 'sessions::condition::ga:deviceCategory==mobile')
    data_3c = get_data(service, profile_id, '90daysAgo', 'ga:operatingSystem,ga:operatingSystemVersion', 'sessions::condition::ga:deviceCategory==mobile')
    data3 = [data_3a, data_3b, data_3c]

    # get top 10 mobile browsers
    print("Getting top 10 mobile browsers...")
    data_4a = get_data(service, profile_id, '30daysAgo', 'ga:browser,ga:browserVersion', 'sessions::condition::ga:deviceCategory==mobile')
    data_4b = get_data(service, profile_id, '60daysAgo', 'ga:browser,ga:browserVersion', 'sessions::condition::ga:deviceCategory==mobile')
    data_4c = get_data(service, profile_id, '90daysAgo', 'ga:browser,ga:browserVersion', 'sessions::condition::ga:deviceCategory==mobile')
    data4 = [data_4a, data_4b, data_4c]

Thanks!

回答1:

You can batch up to 10 requests at a time because of API quota and limits.

from apiclient.http import BatchHttpRequest
import httplib2


def call_back(request_id, response, exception):
  """Do something with the response of each call"""
  pass

def get_request(service, profile_id, days, dimensions, segment):
   """Note I removed the execute() from the end of this method."""
   return service.data().ga().get(
     ids='ga:' + profile_id,
     start_date=days,
     end_date='today',
     metrics='ga:sessions',
     dimensions=dimensions,
     sort='-ga:sessions',
     segment=segment,
     max_results=10)

# Create a batch Http Request object
batch = BatchHttpRequest(callback=self.call_back)


# Construct your queries.
# get top 10 desktop browsers
print("Getting top 10 desktop browsers...")
request_1a = get_request(service, profile_id, '30daysAgo', 'ga:browser', 'sessions::condition::ga:deviceCategory==desktop')
request_1b = get_request(service, profile_id, '60daysAgo', 'ga:browser', 'sessions::condition::ga:deviceCategory==desktop')
request_1c = get_request(service, profile_id, '90daysAgo', 'ga:browser', 'sessions::condition::ga:deviceCategory==desktop')

for request in [request_1a, request_1b, request_1c]:
    batch.add(request)

batch.execute(http=httplib2.Http())