Page sequence paths Google Analytics data in BigQu

2019-08-25 17:00发布

问题:

I am working in BigQuery with Google Analytics data on a topic to understand the page sequence paths. The pages are scoped to the custom dimension (hit level) with is grouping a set of pages. Furthermore, TimeStamp is included to gain more insights about the sequence.

Query (legacy sql):

SELECT 
date,
STRFTIME_UTC_USEC(SEC_TO_TIMESTAMP(visitStartTime + hits.time/1000),"%Y-%m-%d %H:%M:%S") AS TimeStamp,  
concat(fullvisitorid,".", string(visitId)) as sessionsid, 
MAX(IF(hits.customDimensions.index=3, hits.customDimensions.value, NULL)) 
WITHIN hits AS CustomDimension, 
hits.hitNumber as hits.hitNumber, 
visitNumber,    
FROM
TABLE_DATE_RANGE([XXXXXXXX.ga_sessions_], TIMESTAMP('2017-08-01'), 
TIMESTAMP('2017-08-05'))
having sessionsid = "3712616268105648149.1501799276"
ORDER BY
hits.hitNumber AS

Outcome (bases on 1 sessionid): Image outcome query

I’ve got two questions:

1) In the example below the CustomDimension “Thankyou” is the “Thank you” page. Strange fact is that CustomDimension “thankYou” is counted more than once. In this case 2 seconds after each other, but in other cases sometimes at the same moment. GA tracking seems to be fine and can only can send 1 hit (not taking a refresh into consideration). Any ideas if I missing something in the query that is effecting this behaviour?

2) The page sequence is starting CustomDimension “ThankYou”. This is the Thank you page of the order process and then the users continues to the Homepage. It is not logic that a user starts at this page. Do you know if this is caused by the query?

Hope that somebody can give me some advice to put me in the right direction.

Thanks.

回答1:

Related information and handy examples are to be found on the "BigQuery cookbook" on-line document, "Sequence of hits" sub-chapter. The general pattern is given as:

SELECT fullfullVisitorID, visitID, visitNumber, hits.hitNumber, hits.page.pagePath
FROM [ ‘Dataset Name’ ]
WHERE hits.type=[ ‘type of hit’ ];

On the first line, you identify the unique combination of visitor ID and visit IDs. Every hit will belong to a unique combination of these two dimensions. On the first line, you also list the elements of a page path. The FROM statement lists the data source(s). In the WHERE statement, you limit the query to specific types of hits to control what kinds of interactions you observe.

A directly relevant example can be read on the "LunaMetrics Google BigQuery Recipes" page, more precisely as the "All Related Pages in a Session" query:

SELECT
  cur_hit.fullVisitorId AS fullVisitorId,
  cur_hit.visitId AS visitID,
  cur_hit.hits.hitNumber AS cur_hitnumber,
  cur_hit.hits.page.pagePath as cur_pagePath,
  cur_hit.hits.time AS cur_time,
  MAX(prev_hit.hits.hitNumber) AS prev_hitNumber,
FROM
  FLATTEN([google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910], hits) AS prev_hit
INNER JOIN
  FLATTEN([google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910], hits) AS cur_hit
ON
  prev_hit.fullvisitorid = cur_hit.fullvisitorid
  AND prev_hit.visitid = cur_hit.visitid
WHERE
  prev_hit.hits.hitNumber < cur_hit.hits.hitNumber
  AND prev_hit.hits.type = "PAGE"
  AND cur_hit.hits.type = "PAGE"
GROUP BY fullVisitorId, visitID, cur_hitnumber, cur_pagePath, cur_time