I am writing a program to upload a data to some s3a:// link. The program is compiled through mvn install
. Running the program locally (as in using java -jar jarfile.jar
) returned no error. However, when I use spark-submit (as in using spark-submit jarfile.jar
), it returned such error:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.conf.Configuration.reloadExistingConfigurations()V at org.apache.hadoop.fs.s3a.S3AFileSystem.addDeprecatedKeys(S3AFileSystem.java:181) at org.apache.hadoop.fs.s3a.S3AFileSystem.(S3AFileSystem.java:185) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) ...
The error log traced to this portion of my source code:
sparkDataset
.write()
.format("parquet")
.mode(SaveMode.Overwrite)
.save("some s3a:// link");
where sparkDataset
is an instance of org.apache.spark.sql.Dataset
.
Trying How to access s3a:// files from Apache Spark? is unsuccessful and returned another error as such:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/GlobalStorageStatistics$StorageStatisticsProvider
Problem from java.lang.NoSuchMethodError: org.apache.hadoop.conf.Configuration.reloadExistingConfigurations()V is also unlikely because I can run locally, in which the compatilibity is not a problem.
In addition, these are the version of related libraries that I used:
- aws-java-sdk-bundle:1.11.199
- hadoop-aws:3.0.0
I am expecting files written through the s3a:// links. I think dependency is not the issue because I can run locally. I only face this problem when using spark-submit to run this program. Anyone have any ideas on how to resolve this?
Edit: In addition, I have checked that the spark version of the spark submit is said to be built for hadoop 2.7 and above. I am strictly using hadoop 3.0.0. Could this be a clue for why such error happened in my program?