Apache pyspark using oracle jdbc to pull data. Dri

2019-05-28 07:14发布

问题:

I am using apache spark pyspark (spark-1.5.2-bin-hadoop2.6) on windows 7.

I keep getting this error when I run my python script in pyspark.

An error occured while calling o23.load. java.sql.SQLException: No suitable driver found for jdbc:oracle:thin:------------------------------------connection

Here is my python file

import os

os.environ["SPARK_HOME"] = "C:\\spark-1.5.2-bin-hadoop2.6"
os.environ["SPARK_CLASSPATH"] = "L:\\Pyspark_Snow\\ojdbc6.jar"

from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext

spark_config = SparkConf().setMaster("local[8]")  
sc = SparkContext(conf=spark_config) 
sqlContext = SQLContext(sc)

df = (sqlContext
    .load(source="jdbc",
          url="jdbc:oracle:thin://x.x.x.x/xdb?user=xxxxx&password=xxxx",
          dbtable="x.users")
 )
sc.stop()

回答1:

Unfortunately changing environment variable SPARK_CLASSPATH won't work. You need to declare

spark.driver.extraClassPath L:\\Pyspark_Snow\\ojdbc6.jar

in your /path/to/spark/conf/spark-defaults.conf or simply execute spark-submit job with additional argument --jars:

spark-submit --jars "L:\\Pyspark_Snow\\ojdbc6.jar" yourscript.py


回答2:

You can also add the jar using --jars and --driver-class-path and then set the driver specifically. See https://stackoverflow.com/a/36328672/1547734