Beam is using both Google's auto/value and auto/service tools.
I want to run a pipeline with Dataflow runner and data is stored on Google Cloud Storage.
I've added a dependencies:
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-runners-google-cloud-dataflow-java</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-extensions-google-cloud-platform-core</artifactId>
<version>2.0.0</version>
</dependency>
I'm able to start the pipeline from the IntelliJ. But when the jar is compiled through a mvn package
and run with java -jar
it throws an error:
java.lang.IllegalStateException: Unable to find registrar for gs
The fatjar is package with maven-assembly-plugin
. GcsFileSystemRegistrar
class is in the jar.
The issue is in the way that you are building your fatjar. The
maven-assembly-plugin
is not handling files associated withServiceLoader
correctly.ServiceLoader
relies on entries being listed withinMETA-INF/services/org.apache.beam.sdk.io.FileSystemRegistrar
for each implementation so that Java knows how to find them.The contents of the
META-INF/services/org.apache.beam.sdk.io.FileSystemRegistrar
in your fatjar is likely only:You need to have it list (and any other implementations that you want):
Your best bet is to use a tool which understands these
ServiceLoader
requirements like themaven-shade-plugin
when configured to use the ServicesResourceTransformer to build your fatjar.This looks like a problem with assembly strategy, you should accumulate/merge the services for
org.apache.beam.sdk.io.FileSystemRegistrar
. More on similar problem here.