My goal is to be able to access PubSub message Publish Time as recorded and set by Google PubSub in Apache Beam (Dataflow).
PCollection<PubsubMessage> pubsubMsg
= pipeline.apply("Read Messages From PubSub",
PubsubIO.readMessagesWithAttributes()
.fromSubscription(pocOptions.getInputSubscription()));
Does not seem to contain one as an attribute. I have tried
.withTimestampAttribute("publish_time")
No luck either. What am I missing? Is it possible to extract Google PubSub publish time in dataflow?
PubsubIO will read the message from Pub/Sub and assign the message publish time to the element as the record timestamp. Therefore, you can access it using
ProcessContext.timestamp()
. As an example:I published a message a little bit ahead (to have a significant difference between event and processing time) and output with DirectRunner was:
Minimal code here