How to set deadline for BigQuery on Google App Eng

2019-05-02 07:04发布

I have a Google App Engine program that calls BigQuery for data.

The query usually takes 3 - 4.5 seconds and is fine but sometimes takes over five seconds and throws this error:

DeadlineExceededError: The API call urlfetch.Fetch() took too long to respond and was cancelled.

This article shows the deadlines and the different kinds of deadline errors.

Is there a way to set the deadline for a BigQuery job to be above 5 seconds? Could not find it in the BigQuery API docs.

5条回答
SAY GOODBYE
2楼-- · 2019-05-02 07:45

BigQuery queries are fast, but often take longer than the default App Engine urlfetch timeout. The BigQuery API is async, so you need to break up the steps into API calls that each are shorter than 5 seconds.

For this situation, I would use the App Engine Task Queue:

  1. Make a call to the BigQuery API to insert your job. This returns a JobID.

  2. Place a task on the App Engine task queue to check out the status of the BigQuery query job at that ID.

  3. If the BigQuery Job Status is not "DONE", place a new task on the queue to check it again.

  4. If the Status is "DONE," then make a call using urlfetch to retrieve the results.

查看更多
狗以群分
3楼-- · 2019-05-02 07:47

This is one way to solve bigquery timeouts in AppEngine for Go. Simply set TimeoutMs on your queries to well below 5000. The default timeout for bigquery queries is 10000ms which is over the default 5 second deadline for outgoing requests in AppEngine.

The gotcha is that the timeout must be set both in the initial request: bigquery.service.Jobs.Query(…) and the subsequent b.service.Jobs.GetQueryResults(…) which you use to poll the query results.

Example:

query := &gbigquery.QueryRequest{
    DefaultDataset: &gbigquery.DatasetReference{
        DatasetId: "mydatasetid",
        ProjectId: "myprojectid",
    },
    Kind:       "json",
    Query:      "<insert query here>",
    TimeoutMs:  3000, // <- important!
}

queryResponse := bigquery.service.Jobs.Query("myprojectid", query).Do()

// determine if queryResponse is a completed job and if not start to poll

queryResponseResults := bigquery.service.Jobs.
        GetQueryResults("myprojectid", res.JobRef.JobId).
        TimeoutMs(DefaultTimeoutMS) // <- important!

// determine if queryResponseResults is a completed job and if not continue to poll

The nice thing about this is that you maintain the default request deadline for the overall request (60s for normal requests and 10min for tasks and cronjobs) while avoiding setting the deadline for outgoing requests to some arbitrary large value.

查看更多
Bombasti
4楼-- · 2019-05-02 07:52

To issue HTTP requests in AppEngine you can use urllib, urllib2, httplib, or urlfetch. However, no matter what library you choose, AppEngine will perform HTTP requests using App Engine's URL Fetch service.

The googleapiclient uses httplib2. It looks like httplib2.Http passes it's timeout to urlfetch. Since it has a default value of None, urlfetch sets the deadline of that request to 5s no matter what you set with urlfetch.set_default_fetch_deadline.

Under the covers httplib2 uses the socket library for HTTP requests.

To set the timeout you can do the following:

import socket
socket.setdefaulttimeout(30)

You should also be able to do this but I haven't tested it:

http = httplib2.Http(timeout=30)

If you don't have existing code to time the request you can wrap your query like so:

import time
start_query = time.time()

<your query code>

end_query = time.time()
print(end_query - start_query)
查看更多
forever°为你锁心
5楼-- · 2019-05-02 08:04

Note I would go with Michael's suggestion since that is the most robust. I just wanted to point out that you can increase the urlfetch timeout up to 60 seconds, which should be enough time for most queries to complete.

How to set timeout for urlfetch in Google App Engine?

查看更多
戒情不戒烟
6楼-- · 2019-05-02 08:04

I was unable to get the urlfetch.set_default_fetch_deadline() method to apply to the Big Query API, but was able to increase the timeout when authorizing the big query session as follows:

from apiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials

credentials = ServiceAccountCredentials.from_json_keyfile_dict(credentials_dict, scopes)

# Create an authorized session and set the url fetch timeout.
http_auth = credentials.authorize(Http(timeout=60))

# Build the service.
service =  build(service_name, version, http=http_auth)

# Make the query
request = service.jobs().query(body=query_body).execute()

Or with an asynchronous approach using jobs().insert

query_response = service.jobs().insert(body=query_body).execute()

big_query_job_id = query_response['jobReference']['jobId']

# poll the job.get endpoint until the job is complete 
while True:

    job_status_response = service.jobs()\
        .get(jobId=big_query_job_id).execute()

    if job_status_response['status']['state'] == done:
        break

    time.sleep(1)   

results_respone = service.jobs()\
    .getQueryResults(**query_params)\
    .execute()

We ended up going with an approach similar to what Michael suggests above, however even when using the asynchronous call, the getQueryResults method (paginated with a small maxResults parameter) was timing out on url fetch, throwing the error posted in the question.

So, in order to increase the timeout of URL Fetch in Big Query / App Engine, set the timeout accordingly when authorizing your session.

查看更多
登录 后发表回答