BigQuery UPDATE nested array field

2019-03-02 15:44发布

问题:

I need to update nested field in one table, using the value from another table. Using this solution i came up with something that works, but not exactly as i want it to. Here is my solution:

#standardSQL
UPDATE
  `attribution.daily_sessions_20180301_copy1` AS target
SET
hits = ARRAY(
  SELECT AS STRUCT * REPLACE(ARRAY(
    SELECT AS STRUCT *
    FROM(
      SELECT AS STRUCT * REPLACE(map.category AS productCategoryAttribute) FROM UNNEST(product))) AS product) FROM UNNEST(hits)
)
FROM
  `attribution.attribute_category_map`
AS map
WHERE
  (
    SELECT REPLACE(LOWER(prod.productCategory), 'amp;', '') FROM UNNEST(target.hits) AS h,
    UNNEST(h.product) AS prod LIMIT 1) = map.raw_name

attribute_category_map is a table with two columns where i look for corresponding value in column 1 and replace data in target table with value from column 2. Best result i achieved - updated all nested fields on one row with the same value, wich is only correct for the first nested field, instead of updating each nested field with specific value.

Simplified schema of the main table:

[  
   {  
      "name":"sessionId",
      "type":"STRING",
      "mode":"NULLABLE"
   },
   {  
      "name":"hits",
      "type":"RECORD",
      "mode":"REPEATED",
      "fields":[  
         {  
            "name":"product",
            "type":"RECORD",
            "mode":"REPEATED",
            "fields":[  
               {  
                  "name":"productCategory",
                  "type":"STRING",
                  "mode":"NULLABLE"
               },
               {  
                  "name":"productCategoryAttribute",
                  "type":"STRING",
                  "mode":"NULLABLE"
               }
            ]
         }
      ]
   }
]

There are usually several hits within session row and several products within one hit. With values looking like those (if you unnest):

-----------------------------------------------------------------------------
sessionId | hits.product.productCategory| hit.product.productCategoryAttribute
-----------------------------------------------------------------------------
1         | automotive chemicals        | null
1         | automotive tools            | null
1         | null                        | null
2         | null                        | null
2         | automotive chemicals        | null
2         | null                        | null
3         | null                        | null
3         | bed accessories             | null
4         | null                        | null
4         | null                        | null
4         | automotive chemicals        | null
4         | null                        | null
-----------------------------------------------------------------------------

Schema of the map table:

[  
   {  
      "name":"raw_name",
      "type":"STRING",
      "mode":"NULLABLE"
   },
   {  
      "name":"category",
      "type":"STRING",
      "mode":"NULLABLE"
   }
]

with values like this:

---------------------------------------------------
raw_name              |category                   |
---------------------------------------------------
automotive chemicals  |d1y2 - automotive chemicals|
automotive paint      |dijf1 - automotive paint   |
automotive tools      |efw1 - automotive tools    |
baby & infant toys    |wwfw - baby & infant toys  |
batteries & power     |fdsv- batteries & power    |
bed accessories       |0k77 - bed accessories     |
bike racks            |12df - bike racks          |
--------------------------------------------------

What i want as a result:

-----------------------------------------------------------------------------
    sessionId | hits.product.productCategory| hit.product.productCategoryAttribute
-----------------------------------------------------------------------------
    1         | automotive chemicals        | d1y2 - automotive chemicals
    1         | automotive tools            | efw1 - automotive tools
    1         | null                        | null
    2         | null                        | null
    2         | automotive chemicals        | d1y2 - automotive chemicals
    2         | null                        | null
    3         | null                        | null
    3         | bed accessories             | 0k77 - bed accessories
    4         | null                        | null
    4         | null                        | null
    4         | automotive chemicals        | d1y2 - automotive chemicals
    4         | null                        | null
    -----------------------------------------------------------------------------

I need to take value productCategory from main table, look it up in map table in column raw_name, take value from colum category and put it to productCategoryAttribute column of main table. Main problem is target fields are double nested and i can't figure out how to join directly to them.

回答1:

Below is tested!
Leaves whole table's schema/data as is and only updates values of productCategoryAttribute based on the respective mapping

#standardSQL
UPDATE `project.dataset.your_table` t
SET hits = 
  ARRAY(
    SELECT AS STRUCT * REPLACE(
      ARRAY(
        SELECT AS STRUCT product.* REPLACE(
          CASE WHEN map.raw_name = product.productCategory THEN category 
            ELSE productCategoryAttribute END AS productCategoryAttribute)
        FROM UNNEST(product) product
        LEFT JOIN UNNEST(agg_map.map) map 
        ON map.raw_name = product.productCategory
      ) AS product)
    FROM UNNEST(hits) hit
  ) 
FROM (SELECT ARRAY_AGG(row) map FROM `project.dataset.map` row) agg_map 
WHERE TRUE   

Note: above solution assumes that map table is not that big as it relies on aggregating whole map table into one array