I have some third party Database client libraries in Java. I want to access them through
java_gateway.py
E.g: to make the client class (not a jdbc driver!) available to the python client via the java gateway:
java_import(gateway.jvm, "org.mydatabase.MyDBClient")
It is not clear where to add the third party libraries to the jvm classpath. I tried to add to compute-classpath.sh but that did nto seem to work: I get
Py4jError: Trying to call a package
Also, when comparing to Hive: the hive jar files are NOT loaded via compute-classpath.sh so that makes me suspicious. There seems to be some other mechanism happening to set up the jvm side classpath.
You could add
--jars xxx.jar
when using spark-submitor set the enviroment variable
SPARK_CLASSPATH
your_spark_script.py
was written by pyspark APIOne more thing you can do is to add the Jar in the pyspark jar folder where pyspark is installed. Usually /python3.6/site-packages/pyspark/jars
Be careful if you are using a virtual environment that the jar needs to go to the pyspark installation in the virtual environment.
This way you can use the jar without sending it in command line or load it in your code.
You could add the path to jar file using Spark configuration at Runtime.
Here is an example :
Refer the document for more information.
Eg: you have extracted the jar file in C drive in folder named sparkts its value should be: C:\sparkts
You can add external jars as arguments to pyspark