I have a structure
{
id: "123",
scans:[{
"scanid":"123",
"status":"sleep"
}]
},
{
id: "123",
scans:[{
"scanid":"123",
"status":"sleep"
}]
}
Query to remove duplicate:
SELECT *
FROM (
SELECT
*,
ROW_NUMBER()
OVER (PARTITION BY id)
row_number,
FROM table1
)
WHERE row_number = 1
I specified destination table as table1.
Here I have made scans as repeated records, scanid as string and status as string. But when I do some query (I am making a query to remove duplicate) and overwrite the existing table, the table schema is changed. It becomes scans_scanid(string)
and scans_status(string)
. Scans record schema is changed now. Please suggest where am I going wrong?
1) If you run the query on the web UI, the result is automatically flattened, so that's why you see the schema is changed.
You need to run your query and write to a destination table, you have options on the web UI also to do this.
2) If you don't run your query on the web UI but still see schema changed, you should make explicit selects so the schema is retained for you eg:
This creates for you a record like output, but it won't be a repeated record for that please read further.
3) For some use cases you may need to use the
NEST(expr)
function whichIt is known that NEST() is not compatible with UnFlatten Results Output and mostly is used for intermediate result in subquery.
Try below workaround
Note, I use INTEGER for id and scanid. If they should be STRING you need to
a. make change in output schema section
as well as
b. remove use of parseInt() function in
t = {scanid:parseInt(x[0]), status:x[1]}
Here I use BigQuery User-Defined Functions. They are extremely powerful yet still have some Limits and Limitations to be aware of. Also have in mind - they are quite a candidates for being qualified as expensive High-Compute queries