Cannot load main class from JAR file in Spark Subm

I am trying to run a Spark job. This is my shell script, which is located at /home/full/path/to/file/shell/my_shell_script.sh:

confLocation=../conf/my_config_file.conf &&
executors=8 &&
memory=2G &&
entry_function=my_function_in_python &&
dos2unix $confLocation &&
spark-submit \
        --master yarn-client \
        --num-executors $executors \
        --executor-memory $memory \
        --py-files /home/full/path/to/file/python/my_python_file.py $entry_function $confLocation

When I run this, I get an error that says:

Error: Cannot load main class from JAR file: /home/full/path/to/file/shell/my_function_in_python

My impression here is that it is looking in the wrong place (the python file is located in the python directory, not the shell directory).

标签： python shell apache-spark pyspark

2条回答

够拽才男人

2楼-- · 2019-04-28 20:06

What worked for me was to simply pass in the python files without the --py-files command. Looks like this:

confLocation=../conf/my_config_file.conf &&
executors=8 &&
memory=2G &&
entry_function=my_function_in_python &&
dos2unix $confLocation &&
spark-submit \
        --master yarn-client \
        --num-executors $executors \
        --executor-memory $memory \
        /home/full/path/to/file/python/my_python_file.py $entry_function $confLocation

0人赞添加讨论(0) 举报

Lonely孤独者°

3楼-- · 2019-04-28 20:22

The --py-files flag is for additional python file dependencies used from your program; you can see here in SparkSubmit.scala it uses the so-called "primary argument", meaning first non-flag argument, to determine whether to do a "submit jarfile" mode or "submit python main" mode.

That's why you see it trying to load your "$entry_function" as a jarfile that doesn't exist, since it only assumes you're running Python if that primary argument ends with ".py", and otherwise defaults to assuming you have a .jar file.

Instead of using --py-files, just make your /home/full/path/to/file/python/my_python_file.py be the primary argument; then you can either do fancy python to take the "entry function" as a program argument, or you just call your entry function in your main function inside the python file itself.

Alternatively, you can still use --py-files and then create a new main .py file which calls your entry function, and then pass that main .py file as the primary argument instead.

0人赞添加讨论(0) 举报

Cannot load main class from JAR file in Spark Subm

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间