How to load jar dependenices in IPython Notebook

2019-01-03 06:36发布

This page was inspiring me to try out spark-csv for reading .csv file in PySpark I found a couple of posts such as this describing how to use spark-csv

But I am not able to initialize the ipython instance by including either the .jar file or package extension in the start-up that could be done through spark-shell.

That is, instead of ipython notebook --profile=pyspark, I tried out ipython notebook --profile=pyspark --packages com.databricks:spark-csv_2.10:1.0.3 but it is not supported.

Please advise.

标签： csv apache-spark pyspark jupyter-notebook pyspark-sql

2条回答

Deceive 欺骗

2楼-- · 2019-01-03 07:13

I believe you can also add this as a variable to your spark-defaults.conf file. So something like:

spark.jars.packages    com.databricks:spark-csv_2.10:1.3.0

This will load the spark-csv library into PySpark every time you launch the driver.

Obviously zero's answer is more flexible because you can add these lines to your PySpark app before you import the PySpark package:

import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.databricks:spark-csv_2.10:1.3.0 pyspark-shell'

from pyspark import SparkContext, SparkConf

This way you are only importing the packages you actually need for your script.

0人赞添加讨论(0) 举报

爷、活的狠高调

3楼-- · 2019-01-03 07:16

You can simply pass it in the PYSPARK_SUBMIT_ARGS variable. For example:

export PACKAGES="com.databricks:spark-csv_2.11:1.3.0"
export PYSPARK_SUBMIT_ARGS="--packages ${PACKAGES} pyspark-shell"

These property can be also set dynamically in your code before SparkContext / SparkSession and corresponding JVM have been started:

packages = "com.databricks:spark-csv_2.11:1.3.0"

os.environ["PYSPARK_SUBMIT_ARGS"] = (
    "--packages {0} pyspark-shell".format(packages)
)

0人赞添加讨论(0) 举报

How to load jar dependenices in IPython Notebook

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间