BigQuery: error in SPLIT() returns

2019-07-25 11:03发布

问题:

I have a table TabA in BigQuery which has a column ColA and the column ColA has the following structure

1038627|21514184

and the table TabA has more than a million records. I used this to split into multiple columns

SELECT ColA,FIRST(SPLIT(ColA, '/')) part1,
       NTH(2, SPLIT(ColA, '/')) part2
FROM TabA

But for some reason, after certain rows the split seems to be not working properly.

We are getting records like this,

     ColA            part1   part2
1038627|21507470    1038627 21507470     
1038627|21534857    1038627 21507470     
1038627|21546455    1038627 21507470     
1038627|21577167    1038627 21507470

It his happening on a random basis. Not sure where is there error.

SELECT COUNT(*) FROM TabA - returns say 1.7M records


SELECT ColA,FIRST(SPLIT(ColA, '|')) part1, NTH(2, SPLIT(ColA, '|')) part2 FROM TabA - returns 1.7M records with the wrong split


SELECT FIRST(SPLIT(ColA, '|')) part1, NTH(2, SPLIT(ColA, '|')) part2 FROM TabA - returns just 1.4L records with correct split

Don't know what exactly is happening...is it the problem with the data or the problem with the split ??

Any help would be greatly appreciated. Thanks in advance!!

回答1:

is it the problem with the data or the problem with the split ??

To help in troubleshooting - I would recommend running same logic in BigQuery Standard SQL

#standardSQL
SELECT 
  ColA,
  SPLIT(ColA, '|')[SAFE_OFFSET(0)] AS part1,
  SPLIT(ColA, '|')[SAFE_OFFSET(1)] AS part2
FROM TabA