If we have a stream that looks like this
Person {
…
OrganizationID
}
that we want to join with another stream
Organization {
ID
…
}
to create a composite record like so:
Person {
…
Organization {
ID
…
}
}
What is the most idiomatic and efficient way to do so in the Apache Beam programming model?
NB: have seen side input
s recommended as a solution to similar problems like this, but it is not applicable here since the effect we are after is that every change to either Person
or Organization
should yield a new augmented Person
-record.
Edit:
The answer is, your example is not supported by Apache Beam due to missing retraction in Apache Beam implementation.
===================================================
Original answer:
You might want to check Join library[1] in Apache Beam.
Join in Beam model needs extra thinking on windowing strategies on your streams. Sounds like your streams does not require windowing, so say your streams are both in global window. But if you set global window on both of your streams, use default trigger and do Join like Beam's Join library, due to watermark never passes endless window, your Join will not emit any result. If you set repeatly data driven trigger (fire once seen enough elements), however, due to missing supporting for retraction in Beam, it's not clear how pre-emited result is refined for Join.
[1] https://github.com/apache/beam/blob/master/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L49