I understand that i can give some global value to my mappers via the Job and the Configuration.
But how can i do that using Hadoop Streaming(Python in my case)?
What is the right way?
I understand that i can give some global value to my mappers via the Job and the Configuration.
But how can i do that using Hadoop Streaming(Python in my case)?
What is the right way?
Based on the docs you can specify a command line option (-cmdenv name=value
) to set environment variables on each distributed machine that you can then use in your mappers/reducers:
$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-input input.txt \
-output output.txt \
-mapper mapper.py \
-reducer reducer.py \
-file mapper.py \
-file reducer.py \
-cmdenv MY_PARAM=thing_I_need