Hadoop global variable with streaming

2019-07-10 07:36发布

站内文章 / 前端开发

26 0

叛逆

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I understand that i can give some global value to my mappers via the Job and the Configuration.

But how can i do that using Hadoop Streaming(Python in my case)?

What is the right way?

回答1:

Based on the docs you can specify a command line option (-cmdenv name=value) to set environment variables on each distributed machine that you can then use in your mappers/reducers:

$HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
    -input input.txt \
    -output output.txt \
    -mapper mapper.py \
    -reducer reducer.py \
    -file mapper.py \
    -file reducer.py \
    -cmdenv MY_PARAM=thing_I_need