Beam is using both Google's auto/value and auto/service tools.
I want to run a pipeline with Dataflow runner and data is stored on Google Cloud Storage.
I've added a dependencies:
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-runners-google-cloud-dataflow-java</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-extensions-google-cloud-platform-core</artifactId>
<version>2.0.0</version>
</dependency>
I'm able to start the pipeline from the IntelliJ. But when the jar is compiled through a mvn package
and run with java -jar
it throws an error:
java.lang.IllegalStateException: Unable to find registrar for gs
The fatjar is package with maven-assembly-plugin
. GcsFileSystemRegistrar
class is in the jar.
The issue is in the way that you are building your fatjar. The maven-assembly-plugin
is not handling files associated with ServiceLoader
correctly. ServiceLoader
relies on entries being listed within META-INF/services/org.apache.beam.sdk.io.FileSystemRegistrar
for each implementation so that Java knows how to find them.
The contents of the META-INF/services/org.apache.beam.sdk.io.FileSystemRegistrar
in your fatjar is likely only:
org.apache.beam.sdk.io.LocalFileSystemRegistrar
You need to have it list (and any other implementations that you want):
org.apache.beam.sdk.io.LocalFileSystemRegistrar
org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystemRegistrar
Your best bet is to use a tool which understands these ServiceLoader
requirements like the maven-shade-plugin
when configured to use the ServicesResourceTransformer to build your fatjar.
This looks like a problem with assembly strategy, you should accumulate/merge the services for org.apache.beam.sdk.io.FileSystemRegistrar
. More on similar problem here.