Hadoop on CentOS streaming example with python - p

2019-07-09 05:22发布

问题:

I have been able to set up the streaming example with python mapper & reducer. The mapred folder location is /mapred/local/taskTracker both root & mapred users have the ownership to this folder & sub folders

however when I run my streaming it creates maps but no reduces and gives the following error Cannot Run Program /mapred/local/taskTracker/root/jobcache/job_201303071607_0035/attempt_201303071607_0035_m_000001_3/work/./mapper1.py Permission Denied

I noticed that though it have provided a+rwx permission to mapred/local/taskTracker and all its sub directories, when mapreduce creates the temp folders for this job, the folders do not have the rwx for all users ...and hence I get the permission denied error

I have been looking for forum threads on this, and though there are threads mentioning the same error ...I could not find any responses with resolutions.

any help would be greatly appreciated

回答1:

I assume that you run your Hadoop daemons as user root. In this case the permissions of newly created files are determined by the umask of user root. However you must not change the umask for root.

If you'd like to run MapReduce jobs and cluster as different users, it would be better to start the Hadoop daemons as user hadoop and the MapReduce jobs as user mapreduce. However both users should belong to the same group, i.e. hadoop. Furthermore the umask for user hadoop shall be set accordingly.