Converting loop with variables to BigQuery SQL

2020-05-06 13:56发布

I have thousands of script that include loop over dataset and use variables for accumulation. Eg:

// assuming that 'ids' is populated from some BQ table
ids = [1, 2, 3, 4, 5]
var1 = "v1"   //initialize variable
var2 = "v2"   //initialize variable

for id in ids
    var1 = var2
    if (id > 2)
      var2 = var1 + "-" + item
    else
      var2 = id
    print(id, var1, var2)

This would produce below output:

1,v2,1
2,1,2
3,2,2-3
4,2-3,2-3-4
5,2-3-4,2-3-4-5

Few other things to consider:

  • Script can involve n number of variables.
  • Variables can act as accumulators or can simply have literals.
  • Condition in loop can be more complex, multiple nested if / else conditions.
  • Script can have nested loop as well which would loop for n times or till certain condition is not met.

Is it possible to support it in BigQuery SQL? I'm open for BigQuery UDF as well but UDF does not support mutable state.

Also since there are thousands of such scripts, instead of doing it manually, I want to automate conversion. Parsing these scripts is not an issue for me, I only need to convert this to equivalent BigQuery SQL. Let me know how I can approach this problem.

Edit: Though I'm more inclined to use SQL only approach but BigQuery script also seems promising option.

2条回答
女痞
2楼-- · 2020-05-06 14:06

I was able to convert SQL query for the aforementioned sample script:

with  t as
(select 1 as id union all
select 2 as id union all
select 3 as id union all
select 4 as id union all
select 5 as id)

select id, if(id = 1, 'v2', lag(var2) over(order by id)) var1, var2
from (
    select
      id,
      if(
        id > 2,
        string_agg(if(id > 1, concat('', '', cast(id as string)), null), '-') over (rows between unbounded preceding and current row),
        cast(id as string)
      ) var2
    from t
  )

And like that I'm also able to convert most of the scripts in pure SQL. It's just building model around it is more involved but at least I'm moving ahead with manual conversions.

查看更多
别忘想泡老子
3楼-- · 2020-05-06 14:13

I have managed to reproduce your script using SQL UDF in BigQuery, I have used StandardSQL instead of Legacy SQL, you can read more about it here. I would like to state that UDF suports JavaScript and I found it better to write it using its functionality.

I used this dummy data instead of declaring an array:

enter image description here

Below it is the script I developed to reproduce what you accomplished using your code.

DECLARE VAR1 STRING DEFAULT 'v1';
DECLARE var2 STRING DEFAULT 'v2';

CREATE TEMPORARY FUNCTION test_v2 ( x1 string , x2 string,id INT64)
RETURNS STRING
LANGUAGE js AS """
var output= new Array(); 
i =0;
          while(i <= 5){
          x1=x2;
          if(i > 2) {
            x2 =x1 + " - " + i ;
            output.push(x2);

          } else{
                  x2 = i;
                  output.push(x2);
                } 
          i++;}return  output[id];
""";

CREATE TEMPORARY FUNCTION test_v1 ( x1 string , x2 string,id INT64)
RETURNS STRING
LANGUAGE js AS """
var output= new Array(); 
i =1;
output.push(x1);
          while(i <= 5){
          x1 = x2;
          if(i > 2) {
            x2 =x1 + " - " + i ;
            output.push(x1);

          } else{
                  output.push(x1);
                  x2 = i;
                } 
          i++;}return  output[id];
""";

SELECT ids,test_v1(var1,var2,ids) as v1, test_v2(var1,var2,ids) as var2 FROM `test-proj-261014.sample.ids` LIMIT 1000

Notice that I had to write two different functions, one for each row var1 and var2. Also, the array starts at position 1 in the test_v1 function.

Finally, the output was:

enter image description here

I hope it helps.

查看更多
登录 后发表回答