A lot of the documentation for the Google Cloud Platform for Java SDK 2.x tell you to reference Beam documentation.
When reading from PubSub using Dataflow, should I still be doing PubsubIO.Read.named("name").topic("");
Or should I be doing something else?
Also building off of that, is there a way to just print PubSub data received by the Dataflow to standard output or to a file?
For Apache Beam 2.2.0, you can define the following transform to pull messages from a Pub/Sub subscription:
PubsubIO.readMessages().fromSubscription("subscription_name")
This is one way to define a transform that will pull messages from Pub/Sub. However, the PubsubIO
class contains different methods for pulling messages. Each method has slightly different functionality. See the PubsubIO documentation.
You can write the Pub/Sub messages to a file using the TextIO
class. See the examples in the TextIO documentation. See the Logging Pipeline Messages documentation for writing Pub/Sub messages to stdout
.
Adding to what Adrew wrote above. Code to read strings from PubSubIO and write them to stdout (just for debugging) is below. That said, I will file internal bug to improve JavaDoc for PubsubIO, I think the current documentation is minimal.
public static void main(String[] args) {
Pipeline pipeline = Pipeline.create(PipelineOptionsFactory.fromArgs(args).create());
pipeline
.apply("ReadStrinsFromPubsub",
PubsubIO.readStrings().fromTopic("/topics/my_project/my_topic"))
.apply("PrintToStdout", ParDo.of(new DoFn<String, Void>() {
@ProcessElement
public void processElement(ProcessContext c) {
System.out.printf("Received at %s : %s\n", Instant.now(), c.element()); // debug log
}
}));
pipeline.run().waitUntilFinish();
}