I am reading data from a bounded source, a csv file, in a batch pipeline and would like to assign a timestamp to the elements based on data stored as a column in the csv file. How do I do this in a Apache Beam pipeline?
相关问题
- Why do Dataflow steps not start?
- Apache beam DataFlow runner throwing setup error
- Apply Side input to BigQueryIO.read operation in A
- Reading BigQuery federated table as source in Data
- CloudDataflow can not use “google.cloud.datastore”
相关文章
- Apply TensorFlow Transform to transform/scale feat
- Kafka to Google Cloud Platform Dataflow ingestion
- How to run dynamic second query in google cloud da
- How do I use MapElements and KV in together in Apa
- Beam/Google Cloud Dataflow ReadFromPubsub Missing
- Cloud Dataflow failure recovery
- Difference between beam.ParDo and beam.Map in the
- KafkaIO checkpoint - how to commit offsets to Kafk
If your batched source of data contains an event based timestamp per element, for example you have a click event which has the tuple
{'timestamp, 'userid','ClickedSomething'}
. You can assign the timestamp to the element within aDoFn
in your pipeline.Java:
Python:
[Edit non-lambda Python Example from Beam guide:]
[Edit As per Anton comment] More information can be found @
https://beam.apache.org/documentation/programming-guide/#adding-timestamps-to-a-pcollections-elements