BigQuery check for array overlap

2020-04-18 07:19发布

问题:

So I'm writing a BigQuery query and basically just need to be able to check if any of a number of strings are present as elements in one of the columns of the table, where the cared-about column itself contains arrays of strings. Just for context, I'm writing the query as part of a little automated Python job and am using standard SQL.

I couldn't find anything that would explicitly check for array inclusion here: https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators

So I came up with a solution that employs a pretty hacky regex, specifically:

...other query stuff...

WHERE
    REGEXP_CONTAINS((LOWER(ARRAY_TO_STRING(column, '-'))), r"({joined_string})")

...where column is the column I care about in the table, and joined_string is a long string composed of all the strings I need to check for joined by | (where | serves as the regex OR operator).

Does there exist some kind of built-in functionality in BigQuery standard SQL that allows one to do this more sanely?

回答1:

Below are two examples.

First assuming you have your strings in another table strings

#standardSQL
WITH yourTable AS (
  SELECT 1 AS id, ['abc', 'def', 'xyz'] AS column UNION ALL
  SELECT 2, ['123', '456', '789'] UNION ALL
  SELECT 3, ['135', '246', '369'] 
),
strings AS (
  SELECT 'abc' AS str UNION ALL
  SELECT '123' UNION ALL
  SELECT '456'
)
SELECT *
FROM yourTable
WHERE (SELECT COUNT(1) FROM UNNEST(column) AS col JOIN strings ON col = str) > 0  

You can add below to SELECT list if you need to see how many strings are matching

(SELECT COUNT(1) FROM UNNEST(column) AS col JOIN strings ON col = str) AS cnt

Second example assumes you have list of strings packed in Array

#standardSQL
WITH yourTable AS (
  SELECT 1 AS id, ['abc', 'def', 'xyz'] AS column UNION ALL
  SELECT 2, ['123', '456', '789'] UNION ALL
  SELECT 3, ['135', '246', '369'] 
),
strings AS (
  SELECT ['abc', 'def', '456'] AS strs
)
SELECT yourTable.*
FROM yourTable, strings
WHERE (SELECT COUNT(1) FROM UNNEST(column) AS col JOIN UNNEST(strs) AS str ON col = str) > 0   

Same as in first example - you can add below to SELECT list to see matches count

(SELECT COUNT(1) FROM UNNEST(column) AS col JOIN UNNEST(strs) AS str ON col = str) AS cnt