I'm developing a python program to use like Google Dataflow template.
What I'm doing is writing the data in BigQuery from PubSub:
pipeline_options.view_as(StandardOptions).streaming = True
p = beam.Pipeline(options=pipeline_options)
(p
# This is the source of the pipeline.
| 'Read from PubSub' >> beam.io.ReadFromPubSub('projects/.../topics/...')
#<Transformation code if needed>
# Destination
| 'String To BigQuery Row' >> beam.Map(lambda s: dict(Trama=s))
| 'Write to BigQuery' >> beam.io.Write(
beam.io.BigQuerySink(
known_args.output,
schema='Trama:STRING',
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND
))
)
p.run().wait_until_finish()
The code is running in local, not in Google Dataflow yet
This "works" but not the way i want, because currently the data are stored in the BigQuery Buffer Stream and I can not see it (even after waiting some time).
When are gonna be available in BigQuery? Why are stored in the buffer stream instead of the "normal" table?