List Scheduled Queries in BigQuery

2020-07-23 04:09发布

I have a need to (programatically) analyse the details of Scheduled Queries in BigQuery (eg which tables are updated and which tables accessed in the SQL). I have done something similar for the BQ Tables/Views using Apps Script BigQuery.Tables.list(), but I cannot find an API to access the Scheduled Queries.

The UI is able to list them, so I feel this should be possible programmatically, e.g. via a REST API. Does anyone know if this is possible, what interface is supported (Apps Script, REST ...), and possibly an example of how to use it.

3条回答
迷人小祖宗
2楼-- · 2020-07-23 04:57

Here is a shell script that will notify you in Slack if any of your scheduled queries fail. Simply integrate into your existing workflow (as blocking or non blocking) or have it on a separate Cron job. I have used httpie to send my HTTP post but you can also use curl or others. Also, you can also change your HTTP post for any other action.

See link for potential states of transfer state object

#1/bin/bash

sudo apt-get install httpie

location=US
authToken="$(gcloud auth print-access-token)"
projectId=$(gcloud config get-value project 2>\dev\null)

scheduled_queries=$(curl  -H "Authorization: Bearer $authToken" https://bigquerydatatransfer.googleapis.com/v1/projects/$projectId/locations/$location/transferConfigs?dataSourceIds=scheduled_query)

# pretty print results
echo $scheduled_queries | python -m json.tool

length=$(echo "$scheduled_queries" | grep FAILED)
if [ $length -gt 0 ]; then
    echo A SCHEDULED TRANSFER HAS FAILED
    http POST https://hooks.slack.com/services/<your slack channel> text="A SCHEDULED TRANSFER HAS FAILED: HERE IS STDOUT  >>> $scheduled_queries"

else
    echo No errors in scheduled transfers
fi

If running outside the cloud or on something that hasn't already authenticated, you will need to authenticate before hand. To do this use before the above:

echo $PROD_KEY >> temp_json.json
bash gcloud auth activate-service-account --key-file=temp_json.json --project=$PROJ_NAME_PROD

where $PROD_KEY is the service key for whatever you are trying to access.

查看更多
我只想做你的唯一
3楼-- · 2020-07-23 04:58

The above answers are excellent responses for using the REST API. For completeness, I'd like to include the CLI commands approach for solving the same thing. Personally I find this better suited for shell scripts but YMMV.

Example: List of Schedule Queries from Default Project.

bq ls --transfer_config --transfer_location=US --format=prettyjson

Example: Details of a Schedule Query from Default Project.

bq show --format=prettyjson --transfer_config [RESOURCE_NAME]
# RESOURCE_NAME is a value you can get from the above bq ls command.

Further details can be found here.

查看更多
够拽才男人
4楼-- · 2020-07-23 05:04

Scheduled queries are part of BigQuery's Data Transfer Service so you have to use its API. In particular, the projects.transferConfigs.list method. Fill in the dataSourceIds field with scheduled_query and parent with projects/PROJECT_ID. As discussed in the comments, if you are using a regional location such as europe-west2 instead of a multi-regional one (EU or US) you should use projects.locations.transferConfigs.list instead. Now, parent resource will be in the form of projects/PROJECT_ID/locations/REGIONAL_LOCATION.

In addition, for other transfers you can get the corresponding dataSourceIds using the projects.dataSources.list method. That's how I got the scheduled_query one.

Response will be an array of scheduled queries such as:

{
  "name": "projects/<PROJECT_NUMBER>/locations/us/transferConfigs/<TRANSFER_CONFIG_ID>",
  "destinationDatasetId": "<DATASET>",
  "displayName": "hacker-news",
  "updateTime": "2018-11-14T15:39:18.897911Z",
  "dataSourceId": "scheduled_query",
  "schedule": "every 24 hours",
  "nextRunTime": "2019-04-19T15:39:00Z",
  "params": {
    "write_disposition": "WRITE_APPEND",
    "query": "SELECT @run_time AS time,\n  title,\n  author,\n  text\nFROM `bigquery-public-data.hacker_news.stories`\nLIMIT\n  1000",
    "destination_table_name_template": "hacker_daily_news"
  },
  "state": "SUCCEEDED",
  "userId": "<USER_ID>",
  "datasetRegion": "us"
}

Example of an API call with bash and curl:

#!/bin/bash

# parameter(s)
location=europe-west2

authToken="$(gcloud auth print-access-token)"
projectId=$(gcloud config get-value project 2>\dev\null)

# API call
scheduled_queries=$(curl  -H "Authorization: Bearer $authToken" \
https://bigquerydatatransfer.googleapis.com/v1/projects/$projectId/locations/$location/transferConfigs?dataSourceIds=scheduled_query)

# pretty print results
echo $scheduled_queries | python -m json.tool
查看更多
登录 后发表回答