Cannot load main class from JAR file in Spark Subm

2019-04-28 19:57发布

问题:

I am trying to run a Spark job. This is my shell script, which is located at /home/full/path/to/file/shell/my_shell_script.sh:

confLocation=../conf/my_config_file.conf &&
executors=8 &&
memory=2G &&
entry_function=my_function_in_python &&
dos2unix $confLocation &&
spark-submit \
        --master yarn-client \
        --num-executors $executors \
        --executor-memory $memory \
        --py-files /home/full/path/to/file/python/my_python_file.py $entry_function $confLocation

When I run this, I get an error that says:

Error: Cannot load main class from JAR file: /home/full/path/to/file/shell/my_function_in_python

My impression here is that it is looking in the wrong place (the python file is located in the python directory, not the shell directory).

回答1:

The --py-files flag is for additional python file dependencies used from your program; you can see here in SparkSubmit.scala it uses the so-called "primary argument", meaning first non-flag argument, to determine whether to do a "submit jarfile" mode or "submit python main" mode.

That's why you see it trying to load your "$entry_function" as a jarfile that doesn't exist, since it only assumes you're running Python if that primary argument ends with ".py", and otherwise defaults to assuming you have a .jar file.

Instead of using --py-files, just make your /home/full/path/to/file/python/my_python_file.py be the primary argument; then you can either do fancy python to take the "entry function" as a program argument, or you just call your entry function in your main function inside the python file itself.

Alternatively, you can still use --py-files and then create a new main .py file which calls your entry function, and then pass that main .py file as the primary argument instead.



回答2:

What worked for me was to simply pass in the python files without the --py-files command. Looks like this:

confLocation=../conf/my_config_file.conf &&
executors=8 &&
memory=2G &&
entry_function=my_function_in_python &&
dos2unix $confLocation &&
spark-submit \
        --master yarn-client \
        --num-executors $executors \
        --executor-memory $memory \
        /home/full/path/to/file/python/my_python_file.py $entry_function $confLocation