Google cloud Big query UDF limitations

2019-09-09 16:52发布

问题:

I am facing a problem in Google bigquery. I have some complex computation need to do and save the result in Bigquery. So we are doing that complex computation in Java and saving result in google bigquery with the help of Google cloud dataflow.

But this complex calculation is taking around 28 min to complete in java. Customer requirement is to do within 20 sec.

So we switch to Google bigquery UDF option. One option is Bigquery legacy UDF. Bigquery legacy UDF have limitation that it is processing row one by one so we phased out this option. As we need multiple rows to process the results.

Second option is Scalar UDF. Big query scalar UDF are only can be called from WEB UI or command line and can not be trigger from java client.

If any one have any idea the please provide the direction on the problem how to proceed.

回答1:

You can use scalar UDFs with standard SQL from any client API, as long as the CREATE TEMPORARY FUNCTION statements are passed in the query attribute of the request. For example,

QueryRequest queryRequest =
    QueryRequest
        .newBuilder(
            "CREATE TEMP FUNCTION GetWord() AS ('fire');\n"
                + "SELECT COUNT(DISTINCT corpus) as works_with_fire\n"
                + "FROM `bigquery-public-data.samples.shakespeare`\n"
                + "WHERE word = GetWord();")
        // Use standard SQL syntax for queries.
        // See: https://cloud.google.com/bigquery/sql-reference/
        .setUseLegacySql(false)
        .build();
QueryResponse response = bigquery.query(queryRequest);


回答2:

Big query scalar UDF are only can be called from WEB UI or command line and can not be trigger from java client.

This is not accurate. Standard SQL supports scalar UDFs through CREATE TEMPORARY FUNCTION statement which can be used from any application and any client - it is simply part of the SQL query:

https://cloud.google.com/bigquery/docs/reference/standard-sql/user-defined-functions

To learn how to enable Standard SQL, see this documentation: https://cloud.google.com/bigquery/docs/reference/standard-sql/enabling-standard-sql Particularly, simplest thing would be to add #standardSql tag at the beginning of SQL query.