-->

BigQuery : is it possible to execute another query

2019-07-23 23:17发布

问题:

I have a table that records a row for each unique user per day with some aggregated stats for that user on that day, and I need to produce a report that tells me for each day, the no. of unique users in the last 30 days including that day.

eg.

  • for Aug 31st, it'll count the unique users from Aug 2nd to Aug 31st
  • for Aug 30th, it'll count the unique users from Aug 1st to Aug 30th
  • and so on...

I've looked at some related questions but they aren't quite what I need - if a user logs in on multiple days in the last 30 days he should be counted only once, so I can't just sum the DAU count for the last 30 days.

Bigquery SQL for sliding window aggregate

BigQuery SQL for 28-day sliding window aggregate (without writing 28 lines of SQL)

So far, my ideas are to either:

  • write a simple script that'll execute a separate BigQuery for each of the relevant days
  • write a BigQuery UDF that'll execute basically the same query for each day selected from another query

but I've not found any examples on how to execute another BigQuery query inside an UDF, or if it's possible at all.

回答1:

I need to produce a report that tells me for each day, the no. of unique users in the last 30 days including that day.

Below should do this

SELECT
  calendar_day, 
  EXACT_COUNT_DISTINCT(userID) AS unique_users
FROM (
  SELECT calendar_day, userID
  FROM YourTable
  CROSS JOIN (
    SELECT DATE(DATE_ADD('2016-08-08', pos - 1, "DAY")) AS calendar_day
    FROM (
         SELECT ROW_NUMBER() OVER() AS pos, *
         FROM (FLATTEN((
         SELECT SPLIT(RPAD('', 1 + DATEDIFF('2016-09-08', '2016-08-08'), '.'),'') AS h
         FROM (SELECT NULL)),h
    )))
  ) AS calendar
  WHERE DATEDIFF(calendar_day, dt) BETWEEN 0 AND 29 
)
GROUP BY calendar_day
ORDER BY calendar_day DESC

It assumes YourTable has userID and dt fields (like below for example)

dt          userID   
2016-09-08       1   
2016-09-08       2   
...

And you can control:
- reporting dates range by changing respectively 2016-08-08 and 2016-09-08
- aggregation size by changing 29 in BETWEEN 0 AND 29