I have a PCollection in GCP Dataflow/Apache Beam. Instead of processing it one by one, I need to combine "by N". Something like grouped(N)
. So, in case of bounded processing, it will group by 10 items in batch and last batch with whatever left.
Is this possible in Apache Beam?
相关问题
- Why do Dataflow steps not start?
- Apache beam DataFlow runner throwing setup error
- Apply Side input to BigQueryIO.read operation in A
- Reading BigQuery federated table as source in Data
- CloudDataflow can not use “google.cloud.datastore”
相关文章
- Apply TensorFlow Transform to transform/scale feat
- Kafka to Google Cloud Platform Dataflow ingestion
- How to run dynamic second query in google cloud da
- How do I use MapElements and KV in together in Apa
- Beam/Google Cloud Dataflow ReadFromPubsub Missing
- Cloud Dataflow failure recovery
- Difference between beam.ParDo and beam.Map in the
- KafkaIO checkpoint - how to commit offsets to Kafk
Edit, looks like: Google Dataflow "elementCountExact" aggregation
You should be able to do something similar by assigning elements to global window and using
AfterPane.elementCountAtLeast(N)
. You still need to account for what what if there isn’t enough elements to fire the trigger. You could use this:But you should ask yourself why do you need this heuristic in the first place, there probably is more idomatice way to solve your problem. Read about
Data-Driven Triggers
in Beam’s programming guide