I am trying to use an analytic function to get the top 2 countries with patent applications, and within those top 2 countries, get the top 2 application kinds. For example, the answer will look something like this:
country - code
US P
US A
GB X
GB P
Here is the query I am using to get this:
SELECT
country_code,
MIN(count_country_code) count_country_code,
application_kind
FROM (
WITH
A AS (
SELECT
country_code,
COUNT(country_code) OVER (PARTITION BY country_code) AS count_country_code,
application_kind
FROM
`patents-public-data.patents.publications`),
B AS (
SELECT
country_code,
count_country_code,
DENSE_RANK() OVER(ORDER BY count_country_code DESC) AS country_code_num,
application_kind,
DENSE_RANK() OVER(PARTITION BY country_code ORDER BY count_country_code DESC) AS application_kind_num
FROM
A)
SELECT
country_code,
count_country_code,
application_kind
FROM
B
WHERE
country_code_num <= 2
AND application_kind_num <= 2) x
GROUP BY
country_code,
application_kind
ORDER BY
count_country_code DESC
However, unfortunately, I get a "memory exceeded" error due to the over/order/partition. Here is the message:
Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 112% of limit. Top memory consumer(s): sort operations used for analytic OVER() clauses: 98% other/unattributed: 2%
How would I go about doing the above query (or other similar queries) without running into these memory errors? This can be tested on the public dataset here.
One crude way to do it (which only works if the fields have a semi-low cardinality), would be to do it as a straightforward aggregation operation and sort the results in-memory outside the DB. For example: