How to load jar package such as JDBC in Kubernetes

2019-06-11 14:36发布

问题:

I am following the instructions laid out on Kubernetes' Spark example. I can get to the step with launching the PySpark shell. However, I need to use PySpark with JDBC to connect to my Postgres database. Before I tried Kubernetes, I got the JDBC working with Spark using the spark-defaults.conf file:

spark.driver.extraClassPath /spark/postgresql-9.4.1209.jre7.jar
spark.executor.extraClassPath /spark/postgresql-9.4.1209.jre7.jar

I also had to download the driver into the location first. How do I achieve the same thing with Kubernetes? I don't think I can do

kubectl exec zeppelin-controller-xzlrf -it pyspark --jars /spark/postgresql-9.4.1209.jre7.jar

because the jar would have to be inside the container first. Therefore, maybe I can get it working if I can get the jar file inside the container, but how do I do that? Any thoughts or help is greatly appreciated.

UPDATE: I tried following @LostInOverflow's solution but encountered the following:

kubectl exec zeppelin-controller-2p3ew -it -- pyspark --packages org.postgresql:postgresql:9.4.1209.jre7.jar

which appears to boot up and recognizes the package argument but still fails:

Python 2.7.9 (default, Mar  1 2015, 12:57:24) 
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url = jar:file:/opt/spark-1.5.2-bin-hadoop2.6/lib/spark-assembly-1.5.2-hadoop2.6.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.postgresql#postgresql added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
    confs: [default]
:: resolution report :: resolve 2294ms :: artifacts dl 0ms
    :: modules in use:
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |   1   |   0   |   0   |   0   ||   0   |   0   |
    ---------------------------------------------------------------------

:: problems summary ::
:::: WARNINGS
        module not found: org.postgresql#postgresql;9.4.1209.jre7.jar

    ==== local-m2-cache: tried

      file:/root/.m2/repository/org/postgresql/postgresql/9.4.1209.jre7.jar/postgresql-9.4.1209.jre7.jar.pom

      -- artifact org.postgresql#postgresql;9.4.1209.jre7.jar!postgresql.jar:

      file:/root/.m2/repository/org/postgresql/postgresql/9.4.1209.jre7.jar/postgresql-9.4.1209.jre7.jar.jar

    ==== local-ivy-cache: tried

      /root/.ivy2/local/org.postgresql/postgresql/9.4.1209.jre7.jar/ivys/ivy.xml

    ==== central: tried

      https://repo1.maven.org/maven2/org/postgresql/postgresql/9.4.1209.jre7.jar/postgresql-9.4.1209.jre7.jar.pom

      -- artifact org.postgresql#postgresql;9.4.1209.jre7.jar!postgresql.jar:

      https://repo1.maven.org/maven2/org/postgresql/postgresql/9.4.1209.jre7.jar/postgresql-9.4.1209.jre7.jar.jar

    ==== spark-packages: tried

      http://dl.bintray.com/spark-packages/maven/org/postgresql/postgresql/9.4.1209.jre7.jar/postgresql-9.4.1209.jre7.jar.pom

      -- artifact org.postgresql#postgresql;9.4.1209.jre7.jar!postgresql.jar:

      http://dl.bintray.com/spark-packages/maven/org/postgresql/postgresql/9.4.1209.jre7.jar/postgresql-9.4.1209.jre7.jar.jar

        ::::::::::::::::::::::::::::::::::::::::::::::

        ::          UNRESOLVED DEPENDENCIES         ::

        ::::::::::::::::::::::::::::::::::::::::::::::

        :: org.postgresql#postgresql;9.4.1209.jre7.jar: not found

        ::::::::::::::::::::::::::::::::::::::::::::::



:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: org.postgresql#postgresql;9.4.1209.jre7.jar: not found]
    at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1011)
    at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:286)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Traceback (most recent call last):
  File "/opt/spark/python/pyspark/shell.py", line 43, in <module>
    sc = SparkContext(pyFiles=add_files)
  File "/opt/spark/python/pyspark/context.py", line 110, in __init__
    SparkContext._ensure_initialized(self, gateway=gateway)
  File "/opt/spark/python/pyspark/context.py", line 234, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway()
  File "/opt/spark/python/pyspark/java_gateway.py", line 94, in launch_gateway
    raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number
>>> 

回答1:

You can use --packages with coordinates in place of --jars:

--packages org.postgresql:postgresql:9.4.1209.jre7.jar