Pyspark append executor environment variable

2019-01-19 07:22发布

问题:

Is it possible to append a value to the PYTHONPATH of a worker in spark?

I know it is possible to go to each worker node, configure spark-env.sh file and do it, but I want a more flexible approach

I am trying to use setExecutorEnv method, but with no success

conf = SparkConf().setMaster("spark://192.168.10.11:7077")\
              .setAppName(''myname')\
              .set("spark.cassandra.connection.host", "192.168.10.11") /
              .setExecutorEnv('PYTHONPATH', '$PYTHONPATH:/custom_dir_that_I_want_to_append/')

It creates a pythonpath env.variable on each executor, force it to be lower_case, and does not interprets $PYTHONPATH command to append the value.

I end up with two different env.variables,

pythonpath  :  $PYTHONPATH:/custom_dir_that_I_want_to_append
PYTHONPATH  :  /old/path/to_python

The first one is dynamically created and the second one already existed before.

Does anyone know how to do it?

回答1:

I figured out myself...

The problem is not with spark, but in ConfigParser

Based on this answer, I fixed the ConfigParser to always preserve case.

After this, I found out that the default spark behavior is to append the values to existing worker env.variables, if there is a env.variable with the same name.

So, it is not necessary to mention $PYTHONPATH within dollar sign.

.setExecutorEnv('PYTHONPATH', '/custom_dir_that_I_want_to_append/')