-->

Firebase BigQuery Event Reporting in Data Studio

2020-03-31 08:25发布

问题:

I would like to create two dynamic weekly BigQuery Firebase Reports, reflecting the most recent 12 Weeks of data for:

  1. Event Occurrences per Week
  2. DISTINCT Users who triggered Event Occurrences per Week

The inspiration for these reports came from a Tableau-report I saw online:

I am trying to create dynamic timestamps to pull the values of 1.count of event.name and 2.distinct user_dim.app_info.app_instance_id.

My First Iteration:

SELECT  event.name as event_name, 
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170724' AND _TABLE_SUFFIX < '20170731' THEN event.name END) AS W1,  
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170731' AND _TABLE_SUFFIX < '20170807' THEN event.name END) AS W2,  
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170807' AND _TABLE_SUFFIX < '20170814' THEN event.name END) AS W3,  
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170814' AND _TABLE_SUFFIX < '20170821' THEN event.name END) AS W4,  
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170821' AND _TABLE_SUFFIX < '20170828' THEN event.name END) AS W5,  
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170828' AND _TABLE_SUFFIX < '20170904' THEN event.name END) AS W6,  
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170904' AND _TABLE_SUFFIX < '20170911' THEN event.name END) AS W7,  
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170911' AND _TABLE_SUFFIX < '20170918' THEN event.name END) AS W8,  
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170918' AND _TABLE_SUFFIX < '20170925' THEN event.name END) AS W9,  
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170925' AND _TABLE_SUFFIX < '20171002' THEN event.name END) AS W10,  
COUNT(CASE WHEN _TABLE_SUFFIX >= '20171002' AND _TABLE_SUFFIX < '20171009' THEN event.name END) AS W11,  
COUNT(CASE WHEN _TABLE_SUFFIX >= '20171009' AND _TABLE_SUFFIX < '20171016' THEN event.name END) AS W12  
FROM `<<project-id>>.app_events_*`, UNNEST(event_dim) AS event
WHERE  _TABLE_SUFFIX >= '20170724' AND _TABLE_SUFFIX < '20171016'
GROUP BY event_name
ORDER BY event_name DESC;

I also played around with sudo code below:

# Enable StandardSQL
SELECT week, event.name as event_name, COUNT(event.name) as event_count
FROM 
(TABLE_DATE_RANGE([firebase-analytics-sample-data:android_dataset.app_events_],DATE_ADD(CURRENT_TIMESTAMP(), -1, 'WEEK'), DATE_ADD(CURRENT_TIMESTAMP(), -1, 'WEEK'))), ....
....
....
(TABLE_DATE_RANGE([firebase-analytics-sample-data:android_dataset.app_events_],DATE_ADD(CURRENT_TIMESTAMP(), -12, 'WEEK'), DATE_ADD(CURRENT_TIMESTAMP(), -12, 'WEEK'))),
UNNEST(event_dim) as event
GROUP BY event_name
ORDER BY week DESC
LIMIT 12;

Note: wildcard table does not support views. The query returns an error if the wildcard table matches any views in the dataset. Currently, cached results are not supported for queries against multiple tables using a wildcard function (even if the Use Cached Results option is checked). If you run the same wildcard query multiple times, you are billed for each query. Wildcard tables support native BigQuery storage only. You cannot use wildcards when querying an external table or a view (source).

To summarise:

  • It would be a lot faster if I did not have to manually input the date fields every week, but the script "knows" the current week's index number we are in, and updates the 12 most recent weeks' data.

Appendix

  • Week 30 July 24, 2017 July 30, 2017 20170724 20170730
  • Week 31 July 31, 2017 August 6, 2017 20170731 20170806
  • Week 32 August 7, 2017 August 13, 2017 20170807 20170813
  • Week 33 August 14, 2017 August 20, 2017 20170814 20170820
  • Week 34 August 21, 2017 August 27, 2017 20170821 20170827
  • Week 35 August 28, 2017 September 3, 2017 20170828 20170903
  • Week 36 September 4, 2017 September 10, 2017 20170904 20170910
  • Week 37 September 11, 2017 September 17, 2017 20170911 20170917
  • Week 38 September 18, 2017 September 24, 2017 20170918 20170924
  • Week 39 September 25, 2017 October 1, 2017 20170925 20171001
  • Week 40 October 2, 2017 October 8, 2017 20171002 20171008
  • Week 41 October 9, 2017 October 15, 2017 20171009 20171015

回答1:

It would be a lot faster if i did not have to manually input the date fields every week), but the script "knows" the week-number we are in, and updates the 12 most recent weeks' data given the week we are in.

Below is for BigQuery Standard SQL

WHERE _TABLE_SUFFIX 
  BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 12 * 7 + EXTRACT(DAYOFWEEK FROM CURRENT_DATE()) - 2 DAY)) 
  AND FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL EXTRACT(DAYOFWEEK FROM CURRENT_DATE()) - 1 DAY))

Below just shows the output

#standardSQL
SELECT 
  FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 2 * 7 + EXTRACT(DAYOFWEEK FROM CURRENT_DATE()) - 2 DAY)) first_day,
  FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL EXTRACT(DAYOFWEEK FROM CURRENT_DATE()) - 1 DAY)) last_day  

-

first_day   last_day     
20171002    20171015       

Whenever you run above script - it will return you the start and end of most recent 12 weeks period

Update for:

I posted my first iteration answer above in what i would want

#standardSQL
SELECT 
  CONCAT(
    FORMAT_DATE('Week %W %d %B %Y, ', first_day),
    FORMAT_DATE('%d %B %Y, ', last_day), 
    FORMAT_DATE('%Y%m%d ', first_day),
    FORMAT_DATE('%Y%m%d', last_day)
  ) wk
FROM (
  SELECT 
    DATE_SUB(CURRENT_DATE(), INTERVAL 1 * 7 + EXTRACT(DAYOFWEEK FROM CURRENT_DATE()) - 2 DAY) first_day,
    DATE_SUB(CURRENT_DATE(), INTERVAL EXTRACT(DAYOFWEEK FROM CURRENT_DATE()) - 1 DAY) last_day 
)

with the output

wk   
Week 41 09 October 2017, 15 October 2017, 20171002 20171015