Apache Beam: Unable to find registrar for gs

2019-01-28 00:12发布

问题:

Beam is using both Google's auto/value and auto/service tools.

I want to run a pipeline with Dataflow runner and data is stored on Google Cloud Storage.

I've added a dependencies:

<dependency>
    <groupId>org.apache.beam</groupId>
    <artifactId>beam-runners-google-cloud-dataflow-java</artifactId>
    <version>2.0.0</version>
</dependency>

<dependency>
    <groupId>org.apache.beam</groupId>
    <artifactId>beam-sdks-java-extensions-google-cloud-platform-core</artifactId>
    <version>2.0.0</version>
</dependency>

I'm able to start the pipeline from the IntelliJ. But when the jar is compiled through a mvn package and run with java -jar it throws an error:

java.lang.IllegalStateException: Unable to find registrar for gs

The fatjar is package with maven-assembly-plugin. GcsFileSystemRegistrar class is in the jar.

回答1:

The issue is in the way that you are building your fatjar. The maven-assembly-plugin is not handling files associated with ServiceLoader correctly. ServiceLoader relies on entries being listed within META-INF/services/org.apache.beam.sdk.io.FileSystemRegistrar for each implementation so that Java knows how to find them.

The contents of the META-INF/services/org.apache.beam.sdk.io.FileSystemRegistrar in your fatjar is likely only:

org.apache.beam.sdk.io.LocalFileSystemRegistrar

You need to have it list (and any other implementations that you want):

org.apache.beam.sdk.io.LocalFileSystemRegistrar
org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystemRegistrar

Your best bet is to use a tool which understands these ServiceLoader requirements like the maven-shade-plugin when configured to use the ServicesResourceTransformer to build your fatjar.



回答2:

This looks like a problem with assembly strategy, you should accumulate/merge the services for org.apache.beam.sdk.io.FileSystemRegistrar. More on similar problem here.