I've tried spark-submit with --driver-class-path, with --jars as well as tried this method https://petz2000.wordpress.com/2015/08/18/get-blas-working-with-spark-on-amazon-emr/
On using SPARK_CLASSPATH in the commandline as in
SPARK_CLASSPATH=/home/hadoop/pg_jars/postgresql-9.4.1208.jre7.jar pyspark
I get this error
Found both spark.executor.extraClassPath and SPARK_CLASSPATH. Use only the former.
But I'm not able to add it. How do I add postgresql JDBC jar file to use it from pyspark? I'm using EMR version 4.2
Thanks
1) Clear environment variable:
2) Use --jars option to distribute postgres driver over your cluster:
Adding the jar path to
/etc/spark/conf/spark-defaults.conf
at thespark.driver.extraClassPath
row solved my issue.