I would like to consume data from pubsub through dataflow streaming job and store it into GCS in hourly directories.
What would be best approach?
I tried using WindowedFilenamePolicy but it adds an additional group by and slows down the write operation at the time of writes. Dataflow buffers the data correctly but takes too long to write data in temp bucket.
Any best practice for such fairly common case?
Regards, Pari