How to write messages received from PubSub to a text file in GCS using TextIO in Apache Beam? Saw some methods like withWindowedWrites() and withFilenamePolicy() but couldn't find any example of it in the documentation.
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
Here is an example provided you are using the Java SDK (BEAM 2.1.0).
PipelineOptions options = PipelineOptionsFactory.fromArgs(args)
.withValidation()
.as(PipelineOptions.class);
Pipeline pipeline = Pipeline.create(options);
pipeline.begin()
.apply("PubsubIO",PubsubIO.readStrings()
.withTimestampAttribute("timestamp")
.fromSubscription("projects/YOUR-PROJECT/subscriptions/YOUR-SUBSCRIPTION"))
.apply(Window.<String>into(FixedWindows.of(Duration.standardSeconds(30L))))
.apply(TextIO.write().to("gs://YOUR-BUCKET").withWindowedWrites());
You can see the defaults that the SDK uses for the file naming by exploring the "expand" method in TextIO.Write.expand(PCollection input). Specifically I'd take a look at DefaultFilenamePolicy.java