Big Query - Google Analytics - Time diff between f

2019-04-01 19:08发布

Trying to get a list:

visitorid, time first visit, time of hit where transaction occurred.

What've I've written is only grabbing rows that have transaction revenue. I am also trying to convert visitStartTime which is a unix date, to a regular date via Date(visitStartTime) but that's failing in the group by because of the outputted date.

Any direction super helpful.

SELECT 
  fullvisitorID, 
  visitNumber,
  visitStartTime,
  hits.transaction.transactionRevenue

FROM 
  [75718103.ga_sessions_20150310],
  [75718103.ga_sessions_20150309],
  [75718103.ga_sessions_20150308],
  [75718103.ga_sessions_20150307],
  [75718103.ga_sessions_20150306],
  [75718103.ga_sessions_20150305],
  [75718103.ga_sessions_20150304],
  [75718103.ga_sessions_20150303],
  [75718103.ga_sessions_20150302],


WHERE totals.transactions >=1

GROUP BY    
  fullvisitorID, visitNumber, visitStartTime, hits.transaction.transactionRevenue;

2条回答
smile是对你的礼貌
2楼-- · 2019-04-01 19:28

visitStartTime is defined as POSIX time in Google Analytics schema, which means number of seconds since epoch. BigQuery TIMESTAMP is encoded as number of microseconds since epoch. Therefore, to get start time as TIMESTAMP, I used TIMESTAMP(INTEGERvisitStartTime*1000000)). hits.time contains number of milliseconds since first hit, therefore to get time of transactions, they needed to be multiplied by 1000 to get to microsecond granularity, hence TIMESTAMP(INTEGER(visitStartTime*1000000 + hits.time*1000)). Since hits is repeated RECORD, no GROUP BY is necessary, the data model already has all the hits grouped together. Putting it all together:

SELECT
  fullVisitorId,
  timestamp(integer(visitStartTime*1000000)) as start_time, 
  timestamp(integer(visitStartTime*1000000 + hits.time*1000)) as transaction_time
FROM 
 [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910] 
WHERE hits.transaction.transactionRevenue > 0
查看更多
狗以群分
3楼-- · 2019-04-01 19:29

Mosha's solution is simple and elegant, but is too simple, actually it calculates the time between the first pageview and each transaction inside one visit, so it does not calculate the time between the first visit and the first transaction of one visitor. So if you calculate the average time using Mosha's query it will be 1.33 minute. But if you use the query I created it will be 9.91 minutes. My SQL skills are quite rusted, so it probably can be improved.

Masha's query (avg. time between the first pageview and each transaction inside one visit):

SELECT ROUND(AVG(MinutesToTransaction),2) AS avgMinutesToTransaction FROM (
    SELECT
        fullVisitorId,
        timestamp(integer(visitStartTime*1000000)) as start_time, 
        timestamp(integer(visitStartTime*1000000 + hits.time*1000)) as transaction_time,
        ROUND((TIMESTAMP_TO_SEC(timestamp(integer(visitStartTime*1000000 + hits.time*1000))) - TIMESTAMP_TO_SEC(timestamp(integer(visitStartTime*1000000)) )) / 60, 2) AS MinutesToTransaction
    FROM 
        [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910] 
    WHERE hits.transaction.transactionRevenue > 0
)

My query (avg. time between the first visit and the first transaction of one visitor):

SELECT ROUND(AVG(MinutesToTransaction),2) AS avgMinutesToTransaction FROM (
    SELECT firstInteraction.fullVisitorId, 
        MIN(firstInteraction.visitId) AS firstInteraction.visitId,
        TIMESTAMP(INTEGER(MIN(firstInteraction.visitStartTime)*1000000)) AS timeFirstInteraction,
        firstTransaction.visitId,
        firstTransaction.timeFirstTransaction,
        FIRST(BOOLEAN(firstInteraction.visitId = firstTransaction.visitId)) AS transactionInFirstVisit,
        ROUND((TIMESTAMP_TO_SEC(firstTransaction.timeFirstTransaction) - TIMESTAMP_TO_SEC(TIMESTAMP(INTEGER(MIN(firstInteraction.visitStartTime)*1000000)))) / 60, 2) AS MinutesToTransaction
    FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910] firstInteraction
    INNER JOIN (
        SELECT
            fullVisitorId,
            visitId,
            TIMESTAMP(INTEGER(MIN(visitStartTime*1000000 + hits.time*1000))) AS timeFirstTransaction
        FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910] 
        WHERE hits.type = "TRANSACTION"
        GROUP BY 1, 2
    ) AS firstTransaction
    ON  firstInteraction.fullVisitorId = firstTransaction.fullVisitorId
    GROUP BY 1, 4, 5
)

I left some extra fields so if you use it without the first SELECT you can see some interesting data.

Ps: Thanks Mosha for showing how to calculate the time.

查看更多
登录 后发表回答