UNNEST expression references column which is neith

2020-04-08 13:43发布

问题:

Google Analytics BigQuery tables are structured like this (Legacy SQL notations - only relevant fields are shown):

visitId:                      INTEGER
hits:                         RECORD/REPEATED
hits.hour:                    INTEGER

On one such table, the following query works well:

SELECT
  visitId,
  MIN(h.hour) AS firstHitHour
FROM
  `my-table.ga_sessions_20161122`, UNNEST(hits) AS h
GROUP BY
  visitId

But using this alternative syntax:

SELECT
  visitId,
  (SELECT MIN(hour) FROM UNNEST(hits)) as firstHitHour
FROM
  `my-table.ga_sessions_20161122`
GROUP BY
  visitId

Triggers the following error:

Error: UNNEST expression references column hits which is neither grouped nor aggregated

I understand that UNNEST(hits) must be somehow grouped or aggregated, but since this column is an array (repeated), what does it mean exactly?

If I try to "group the column hits", as requested, like this:

(SELECT MIN(hour) FROM UNNEST(hits) as h GROUP BY h) as firstHitHour

Then I get a Grouping by expressions of type STRUCT is not allowed error.

How can this alternative syntax be corrected to produce the same result as the first one?

回答1:

My first Answer is for original version of this question.
When I answered, I realized you have changed it to quite different one :o)

So below answer is for most recent version of your question:

I think that in "alternative" version you just do not need GROUP BY at all, because you operate on original (un-flattened) row by row and for each row (visitId) you calculate firstHitHour

SELECT
  visitId,
  (SELECT MIN(hour) FROM UNNEST(hits)) as firstHitHour
FROM
  `my-table.ga_sessions_20161122`

In your initial query - you kind of flattening all records for each row - so that's why you need then to group them back



回答2:

Try below (it uses example from your original question):

SELECT
  visitId, source, medium, browser,
  MIN(hour) AS firstHitHour,
  LOGICAL_OR(hasValue) AS hasValue
FROM (
  SELECT
    visitId,
    trafficSource.source AS source,
    trafficSource.medium AS medium,
    device.browser AS browser,
    h.hour AS hour,
    EXISTS(SELECT 1 FROM UNNEST(hits) WHERE eventInfo.eventCategory = "SomeValue") AS hasValue
  FROM
   `my-table.ga_sessions_20161122`, UNNEST(hits) AS h
)
GROUP BY
visitId, source, medium, browser;