I recently saw a pull request that was merged to the Apache/Spark repository that apparently adds initial Python bindings for PySpark on K8s. I posted a comment to the PR asking a question about how to use spark-on-k8s in a Python Jupyter notebook, and was told to ask my question here.
My question is:
Is there a way to create SparkContexts using PySpark's SparkSession.Builder
with master set to k8s://<...>:<...>
, and have the resulting jobs run on spark-on-k8s
, instead of on local
?
E.g.:
from pyspark.sql import SparkSession
spark = SparkSession.builder.master('k8s://https://kubernetes:443').getOrCreate()
I have an interactive Jupyter notebook running inside a Kubernetes pod, and I'm trying to use PySpark to create a SparkContext
that runs on spark-on-k8s instead of resorting to using local[*]
as master
.
Till now, I've been getting an error saying that:
Error: Python applications are currently not supported for Kubernetes.
whenever I set master
to k8s://<...>
.
It seems like PySpark always runs in client
mode, which doesn't seem to be supported for spark-on-k8s
at the moment -- perhaps there's some workaround that I'm not aware of.
Thanks in advance!