Trying to run a mapreduce job on Hadoop using Streaming. I have two ruby scripts wcmapper.rb and wcreducer.rb. I'm attempting to run the job as follows:
hadoop jar hadoop/contrib/streaming/hadoop-streaming-1.2.1.jar -file wcmapper.rb -mapper wcmapper.rb -file wcreducer.rb -reducer wcreducer.rb -input test.txt -output output
This results in the following error message at the console:
13/11/26 12:54:07 INFO streaming.StreamJob: map 0% reduce 0%
13/11/26 12:54:36 INFO streaming.StreamJob: map 100% reduce 100%
13/11/26 12:54:36 INFO streaming.StreamJob: To kill this job, run:
13/11/26 12:54:36 INFO streaming.StreamJob: /home/paul/bin/hadoop-1.2.1/libexec/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201311261104_0009
13/11/26 12:54:36 INFO streaming.StreamJob: Tracking URL: http://localhost.localdomain:50030/jobdetails.jsp?jobid=job_201311261104_0009
13/11/26 12:54:36 ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201311261104_0009_m_000000
13/11/26 12:54:36 INFO streaming.StreamJob: killJob...
Streaming Command Failed!
Looking at the failed attempts for any of the tasks shows:
java.io.IOException: Cannot run program "/var/lib/hadoop/mapred/local/taskTracker/paul/jobcache/job_201311261104_0010/attempt_201311261104_0010_m_000001_3/work/./wcmapper.rb": error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1042)
I understand that hadoop needs to copy the map and reducer scripts for use by all the nodes and believe this is the purpose of the -file arguments. However it seems the scripts are not being copied to the location where hadoop expects to find them. The console indicates they are being packaged I think:
packageJobJar: [wcmapper.rb, wcreducer.rb, /var/lib/hadoop/hadoop-unjar3547645655567272034/] [] /tmp/streamjob3978604690657430710.jar tmpDir=null
I have also tried the following:
hadoop jar hadoop/contrib/streaming/hadoop-streaming-1.2.1.jar -files wcmapper.rb,wcreducer.rb -mapper wcmapper.rb -reducer wcreducer.rb -input test.txt -output output
but this gives the same error.
Can anyone tell me what the problem is?
Or where to look to better diagnose the issue?
Many thanks
Paul