java.lang.NoSuchMethodError when deploying my prog

2019-08-28 00:44发布

问题:

I am writing a program to upload a data to some s3a:// link. The program is compiled through mvn install. Running the program locally (as in using java -jar jarfile.jar) returned no error. However, when I use spark-submit (as in using spark-submit jarfile.jar), it returned such error:

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.conf.Configuration.reloadExistingConfigurations()V at org.apache.hadoop.fs.s3a.S3AFileSystem.addDeprecatedKeys(S3AFileSystem.java:181) at org.apache.hadoop.fs.s3a.S3AFileSystem.(S3AFileSystem.java:185) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) ...

The error log traced to this portion of my source code:

sparkDataset
        .write()
        .format("parquet")
        .mode(SaveMode.Overwrite)
        .save("some s3a:// link");

where sparkDataset is an instance of org.apache.spark.sql.Dataset.

Trying How to access s3a:// files from Apache Spark? is unsuccessful and returned another error as such:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/GlobalStorageStatistics$StorageStatisticsProvider

Problem from java.lang.NoSuchMethodError: org.apache.hadoop.conf.Configuration.reloadExistingConfigurations()V is also unlikely because I can run locally, in which the compatilibity is not a problem.

In addition, these are the version of related libraries that I used:

  • aws-java-sdk-bundle:1.11.199
  • hadoop-aws:3.0.0

I am expecting files written through the s3a:// links. I think dependency is not the issue because I can run locally. I only face this problem when using spark-submit to run this program. Anyone have any ideas on how to resolve this?

Edit: In addition, I have checked that the spark version of the spark submit is said to be built for hadoop 2.7 and above. I am strictly using hadoop 3.0.0. Could this be a clue for why such error happened in my program?

回答1:

Answer from Run spark-submit with my own build of hadoop had seem to guide me on finding my own solution.

Based on my understanding, for some unknown reason*, the spark-submit provided by the distribution 'spark-2.4.0-bin-hadoop2.7.tgz' will exclude any packages of hadoop that is compiled together in your application.

The reason why was the NoSuchMethodError error raised is because the method reloadExistingConfiguration does not exist until Hadoop version 2.8.x. It seemed that writing a parquet would somehow invoke this particular method along the way.

My solution is to use the separate distribution of 'spark-2.4.0-without-hadoop.tgz' while connecting it to hadoop 3.0.0 so that it will use the correct version of hadoop even if spark-submit excluded the packages in your application during execution.

In addition, since the packages would be excluded by spark-submit anyway, I would not create a fat jar during compilation through Maven. Instead, I would use the flag --packages during execution to specify the dependencies that is required to run my application.