Adding postgresql jar though spark-submit on amazo

2019-04-17 00:01发布

I've tried spark-submit with --driver-class-path, with --jars as well as tried this method https://petz2000.wordpress.com/2015/08/18/get-blas-working-with-spark-on-amazon-emr/

On using SPARK_CLASSPATH in the commandline as in

SPARK_CLASSPATH=/home/hadoop/pg_jars/postgresql-9.4.1208.jre7.jar pyspark

I get this error

Found both spark.executor.extraClassPath and SPARK_CLASSPATH. Use only the former.

But I'm not able to add it. How do I add postgresql JDBC jar file to use it from pyspark? I'm using EMR version 4.2

Thanks

2条回答
你好瞎i
2楼-- · 2019-04-17 00:23

1) Clear environment variable:

unset SPARK_CLASSPATH

2) Use --jars option to distribute postgres driver over your cluster:

pyspark --jars=/home/hadoop/pg_jars/postgresql-9.4.1208.jre7.jar
//or
spark-submit --jars=/home/hadoop/pg_jars/postgresql-9.4.1208.jre7.jar <your py script or app jar>
查看更多
小情绪 Triste *
3楼-- · 2019-04-17 00:35

Adding the jar path to /etc/spark/conf/spark-defaults.conf at the spark.driver.extraClassPath row solved my issue.

查看更多
登录 后发表回答