I know that when we implement a ParDo transform, we pick up individual elements from our data(basically separated by "\n"). But what if I have an element that occupies two lines in my file. Can I apply my own condition to pick elements according to it? Or is it always necessary to have an element in a single line?
相关问题
- Why do Dataflow steps not start?
- Apache beam DataFlow runner throwing setup error
- Apply Side input to BigQueryIO.read operation in A
- Reading BigQuery federated table as source in Data
- CloudDataflow can not use “google.cloud.datastore”
相关文章
- Apply TensorFlow Transform to transform/scale feat
- Kafka to Google Cloud Platform Dataflow ingestion
- How to run dynamic second query in google cloud da
- How do I use MapElements and KV in together in Apa
- Beam/Google Cloud Dataflow ReadFromPubsub Missing
- Cloud Dataflow failure recovery
- Difference between beam.ParDo and beam.Map in the
- KafkaIO checkpoint - how to commit offsets to Kafk
Reading of text files is controlled by
TextIO
, not byParDo
- I suppose that's what you meant. Indeed right nowTextIO
splits files into 1 element per line, however there is work in progress on changing that. You can follow the work at https://issues.apache.org/jira/browse/BEAM-2802.It would be useful for that work, if you told more about your file format, to make sure it is in scope.