Is it possible to get the current spark context se

2019-01-30 18:57发布

I'm trying to get the path to spark.worker.dir for the current sparkcontext.

If I explicitly set it as a config param, I can read it back out of SparkConf, but is there anyway to access the complete config (including all defaults) using PySpark?

10条回答
家丑人穷心不美
2楼-- · 2019-01-30 19:03

For a complete overview of your Spark environment and configuration I found the following code snippets useful:

SparkContext:

for item in sorted(sc._conf.getAll()): print(item)

Hadoop Configuration:

hadoopConf = {}
iterator = sc._jsc.hadoopConfiguration().iterator()
while iterator.hasNext():
    prop = iterator.next()
    hadoopConf[prop.getKey()] = prop.getValue()
for item in sorted(hadoopConf.items()): print(item)

Environment variables:

import os
for item in sorted(os.environ.items()): print(item)
查看更多
劫难
3楼-- · 2019-01-30 19:04

Spark 1.6+

sc.getConf.getAll.foreach(println)
查看更多
该账号已被封号
4楼-- · 2019-01-30 19:07

Spark 2.1+

spark.sparkContext.getConf().getAll() where spark is your sparksession (gives you a dict with all configured settings)

查看更多
贼婆χ
5楼-- · 2019-01-30 19:07

For Spark 2+ you can also use when using scala

spark.conf.getAll; //spark as spark session 
查看更多
姐就是有狂的资本
6楼-- · 2019-01-30 19:10

Yes: sc._conf.getAll()

Which uses the method:

SparkConf.getAll()

as accessed by

SparkContext.sc._conf

Note the Underscore: that makes this tricky. I had to look at the spark source code to figure it out ;)

But it does work:

In [4]: sc._conf.getAll()
Out[4]:
[(u'spark.master', u'local'),
 (u'spark.rdd.compress', u'True'),
 (u'spark.serializer.objectStreamReset', u'100'),
 (u'spark.app.name', u'PySparkShell')]
查看更多
狗以群分
7楼-- · 2019-01-30 19:11

Just for the records the analogous java version:

Tuple2<String, String> sc[] = sparkConf.getAll();
for (int i = 0; i < sc.length; i++) {
    System.out.println(sc[i]);
}
查看更多
登录 后发表回答