How can to get the filename from a streaming mapre

2019-02-18 06:34发布

站内文章 / 后端开发

19 0

我欲成王，谁敢阻挡

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am streaming an R mapreduce job and I am need to get the filename. I know that Hadoop sets environment variables for the current job before it starts and I can access env vars in R with Sys.getenv().

I found : Get input file name in streaming hadoop program

and Sys.getenv(mapred_job_id) works fine, but it is not what I need. I just need the filename and not the job id or name. I also found: How to get filename when running mapreduce job on EC2?

But this isn't helpful either. What is the easiest way to get the current filename while streaming from R? Thank you

回答1:

I have not tried this, but from the second link you provided, it seems that this is available in an environment variable called map.input.file. Then, this should work: