For my use case I'm working with data identifiable by unique key at the source exploded into n (non deterministic) number of target entries loaded into BigQuery tables for analytic purposes.
Building this ETL to use Mongo recent Change Stream feature I would like to drop all entries in BigQuery and then load the new entries atomically.
Exploring BigQuery DML I see a MERGE operation is supported, but only WHEN MATCHED THEN DELETE
or WHEN MATCHED THEN UPDATE
is possible.
I'm interested in a WHEN MATCHED THEN DELETE, AND FOLLOW BY AN INSERT operation.
How would I implement such ETL in BigQuery while remaining atomic or eventually consistent as possible in terms of data availability and correctness.
EDIT 1: I would like to provide a concrete example to elaborate.
The lowest granularity of uniqueness I have on this dataset is user_id
. Rows are not uniquely identifiable.
Example
1.
Updated user object received from mongo change stream:
user={_id: "3", name="max", registered="2018-07-05" q=["a", "b", "c"]}
2.
Current BigQuery.user_q holds
| user_id | q |
...
| 3 | a |
| 3 | b |
...
3.
Transform code loads modified user object into BigQuery.user_q_incoming
| user_id | q |
| 3 | a |
| 3 | b |
| 3 | c |
4.
MERGE between user_q
and user_q_incoming
:
- 2 rows in
user_q
that belong touser_id 3
are DELETED - 3 rows in
user_q_incoming
that belong touser_id 3
are INSERTED. - Rest of the data (
...
) inuser_q
is left in place, unmodified.
5.
BigQuery.user_q holds
| user_id | q |
...
| 3 | a |
| 3 | b |
| 3 | c |
...
For example user might delete a question from his profile. Leaving the remaining rows to be q=["a", "c"]
. I need this to translate into the BigQuery world view as well.
There is a similar question and one walk-around to make MERGE work (https://issuetracker.google.com/issues/35905927#comment9).
Basically, something like following should work,
Ideally, following is what you need, but it's not supported yet.
INSERT is supported by BigQuery DML
for example
so, you should be good to go with your ETL
Ok, I see - I think in this case MERGE will not apply as INSERT can be done ONLY for NOT MATCH clause. Someone might figure out how to trick MERGE to work in this case, but meantime below solution does what you want to achieve - i think so :o)