How to create a channelpath in BigQuery based on e

2019-07-24 19:04发布

问题:

I want to create channelpaths in Bigquery on a user level. I want the path to end when a transaction occurs. The next visit will then start a new path. Currently I have one path per user summing all the transactions. See the provided code below. I've also included the current OUTPUT TABLE and the desired OUTPUT TABLE.

My idea would be to create a new column that is counting the transactions. This value would start at 0 and need to be incremented by 1 AFTER a transaction occured. Then I would merge this value with the user_id value and group the aggregated strings on that variable. But I dont know how to do this.

Thanks in advance!

Guido

#standardSQL
WITH yourTable AS (
  SELECT 1 AS user_id,'1a' as visit_id, '2017-01-01 14:10:12' AS DATETIME, 
'google cpc' AS channelgrouping, 0 AS transaction , 1 as visit UNION ALL
  SELECT 1, '1b', '2017-01-01 20:10:12', 'email', 1, 1 UNION ALL
  SELECT 1, '1c','2017-01-03 08:10:12', 'direct', 0, 1 UNION ALL
  SELECT 1, '1d','2017-01-04 13:10:14', 'organic', 1, 1
)
SELECT 
  user_id, 
  STRING_AGG(channelgrouping, ' > ' ORDER BY DATETIME) AS channelgrouping_path,
  SUM(transaction) AS transaction,
  SUM(visit) AS visits
 FROM yourTable
GROUP BY user_id

OUTPUT TABLE

user_id|channgelgrouping_path                |Transactions|Visits
1      |google cpc > email > direct > organic| 2          | 4

DESIRED OUTPUT TABLE

user_id|channgelgrouping_path                |Transactions|Visits
1      |google cpc > email                   | 1          | 2
1      |direct > organic                     | 1          | 2

回答1:

Try below

#standardSQL
WITH yourTable AS (
  SELECT 1 AS user_id,'1a' AS visit_id, '2017-01-01 14:10:12' AS DATETIME, 
'google cpc' AS channelgrouping, 0 AS transaction , 1 AS visit UNION ALL
  SELECT 1, '1b', '2017-01-01 20:10:12', 'email', 1, 1 UNION ALL
  SELECT 1, '1c','2017-01-03 08:10:12', 'direct', 0, 1 UNION ALL
  SELECT 1, '1d','2017-01-04 13:10:14', 'organic', 1, 1
)
SELECT
  user_id, 
  STRING_AGG(channelgrouping, ' > ' ORDER BY DATETIME) AS channelgrouping_path,
  SUM(transaction) AS transaction,
  SUM(visit) AS visits
FROM (
  SELECT 
    *, 
    SUM(transaction) OVER(PARTITION BY user_id ORDER BY datetime 
                ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
    ) AS grp
  FROM yourTable
)
GROUP BY user_id, IFNULL(grp, 0)