PySpark SparkSession Builder with Kubernetes Maste

2020-06-04 00:40发布

I recently saw a pull request that was merged to the Apache/Spark repository that apparently adds initial Python bindings for PySpark on K8s. I posted a comment to the PR asking a question about how to use spark-on-k8s in a Python Jupyter notebook, and was told to ask my question here.

My question is:

Is there a way to create SparkContexts using PySpark's SparkSession.Builder with master set to k8s://<...>:<...>, and have the resulting jobs run on spark-on-k8s, instead of on local?

E.g.:

from pyspark.sql import SparkSession
spark = SparkSession.builder.master('k8s://https://kubernetes:443').getOrCreate()

I have an interactive Jupyter notebook running inside a Kubernetes pod, and I'm trying to use PySpark to create a SparkContext that runs on spark-on-k8s instead of resorting to using local[*] as master.

Till now, I've been getting an error saying that:

Error: Python applications are currently not supported for Kubernetes.

whenever I set master to k8s://<...>.

It seems like PySpark always runs in client mode, which doesn't seem to be supported for spark-on-k8s at the moment -- perhaps there's some workaround that I'm not aware of.

Thanks in advance!

1条回答
手持菜刀,她持情操
2楼-- · 2020-06-04 01:02

pyspark client mode works on Spark's latest version 2.4.0

This is how I did it (in Jupyter lab):

import os
os.environ['PYSPARK_PYTHON']="/usr/bin/python3.6"
os.environ['PYSPARK_DRIVER_PYTHON']="/usr/bin/python3.6"

from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession

sparkConf = SparkConf()
sparkConf.setMaster("k8s://https://localhost:6443")
sparkConf.setAppName("KUBERNETES-IS-AWESOME")
sparkConf.set("spark.kubernetes.container.image", "robot108/spark-py:latest")
sparkConf.set("spark.kubernetes.namespace", "playground")

spark = SparkSession.builder.config(conf=sparkConf).getOrCreate()
sc = spark.sparkContext

Note: I am running kubernetes locally on Mac with Docker Desktop.

查看更多
登录 后发表回答