Im currently processing data in BigQuery then export into Excel to do the final Pivot table and was hoping to be able to create the same with the PIVOT option in BigQuery.
My Data set in big query looks like
Transaction_Month || ConsumerId || CUST_createdMonth
01/01/2015 || 1 || 01/01/2015
01/01/2015 || 1 || 01/01/2015
01/02/2015 || 1 || 01/01/2015
01/01/2015 || 2 || 01/01/2015
01/02/2015 || 3 || 01/02/2015
01/02/2015 || 4 || 01/02/2015
01/02/2015 || 5 || 01/02/2015
01/03/2015 || 5 || 01/02/2015
01/03/2015 || 6 || 01/03/2015
01/04/2015 || 6 || 01/03/2015
01/06/2015 || 6 || 01/03/2015
01/03/2015 || 7 || 01/03/2015
01/04/2015 || 8 || 01/04/2015
01/05/2015 || 8 || 01/04/2015
01/04/2015 || 9 || 01/04/2015
It is essentially an order table with customer information appended.
When i put this data into excel I add it to a pivot table, I add the CUST_createdMonth as a Row, Transaction_Month as a column and the value is a distinct Count of the ConsumerID
The output looks as follows
Is this sort of pivot possible in BigQuery?
There is no nice way of doing this in BigQuery, but you can do it follow below idea
Step 1
Run below query
SELECT 'SELECT CUST_createdMonth, ' +
GROUP_CONCAT_UNQUOTED(
'EXACT_COUNT_DISTINCT(IF(Transaction_Month = "' + Transaction_Month + '", ConsumerId, NULL)) as [m_' + REPLACE(Transaction_Month, '/', '_') + ']'
)
+ ' FROM yourTable GROUP BY CUST_createdMonth ORDER BY CUST_createdMonth'
FROM (
SELECT Transaction_Month
FROM yourTable
GROUP BY Transaction_Month
ORDER BY Transaction_Month
)
As a result - you will get string like below (it is formatted below for readability sake)
SELECT
CUST_createdMonth,
EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/01/2015", ConsumerId, NULL)) AS [m_01_01_2015],
EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/02/2015", ConsumerId, NULL)) AS [m_01_02_2015],
EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/03/2015", ConsumerId, NULL)) AS [m_01_03_2015],
EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/04/2015", ConsumerId, NULL)) AS [m_01_04_2015],
EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/05/2015", ConsumerId, NULL)) AS [m_01_05_2015],
EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/06/2015", ConsumerId, NULL)) AS [m_01_06_2015]
FROM yourTable
GROUP BY
CUST_createdMonth
ORDER BY
CUST_createdMonth
Step 2
Just run above composed query
Result will be lik e below
CUST_createdMonth m_01_01_2015 m_01_02_2015 m_01_03_2015 m_01_04_2015 m_01_05_2015 m_01_06_2015
01/01/2015 2 1 0 0 0 0
01/02/2015 0 3 1 0 0 0
01/03/2015 0 0 2 1 0 1
01/04/2015 0 0 0 2 1 0
Note
Step 1 is helpful if you have many months to pivot so too much of manual work.
In this case - Step 1 helps you to generate your query
You can see more about pivoting in my other posts.
How to scale Pivoting in BigQuery?
Please note – there is a limitation of 10K columns per table - so you are limited with 10K organizations.
You can also see below as simplified examples (if above one is too complex/verbose):
How to transpose rows to columns with large amount of the data in BigQuery/SQL?
How to create dummy variable columns for thousands of categories in Google BigQuery?
Pivot Repeated fields in BigQuery
Actually Mikhail there is another way to transpose rows of an EAV type schema into columns, by using logging tables and querying the last CREATE TABLE entry to determine the latest table schema.
CREATE TEMP FUNCTION jsonSchemaStringToArray(jsonSchema String)
RETURNS ARRAY<STRING> AS ((
SELECT
SPLIT(
REGEXP_REPLACE(REPLACE(LTRIM(jsonSchema,'{ '),'"fields": [',''), r'{[^{]+"name": "([^\"]+)"[^}]+}[, ]*', '\\1,')
,',')
));
WITH valid_schema_columns AS (
WITH array_output aS (SELECT
jsonSchemaStringToArray(jsonSchema) AS column_names
FROM (
SELECT
protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.schemaJson AS jsonSchema
, ROW_NUMBER() OVER (ORDER BY metadata.timestamp DESC) AS record_count
FROM `realself-main.bigquery_logging.cloudaudit_googleapis_com_data_access_20170101`
WHERE
protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.destinationTable.tableId = '<table_name>'
AND
protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.destinationTable.datasetId = '<schema_name>'
AND
protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.createDisposition = 'CREATE_IF_NEEDED'
) AS t
WHERE
t.record_count = 1 -- grab the latest entry
)
-- this is actually what UNNESTS the array into standard rows
SELECT
valid_column_name
FROM array_output
LEFT JOIN UNNEST(column_names) AS valid_column_name
)