I have a mapper and reducer that work fine when I run them in the piped version:
cat data.csv | ./mapper.py | sort -k1,1 | ./reducer.py
I used the elastic mapreducer wizard, loaded inputs, outputs, bootstrap, etc. The bootstrap is successful, but I am still getting an error in execution.
This is the error I'm getting in my stderr for step 1...
+ /etc/init.d/hadoop-state-pusher-control stop
+ PID_FILE=/mnt/var/run/hadoop-state-pusher/hadoop-state-pusher.pid
+ LOG_FILE=/mnt/var/log/hadoop-state-pusher/hadoop-state-pusher.out
+ SVC_FILE=/mnt/var/lib/hadoop-state-pusher/run-hadoop-state-pusher
+ case $1 in
+ stop
+ echo 0
/etc/init.d/hadoop-state-pusher-control: line 35: /mnt/var/lib/hadoop-state-pusher/run-hadoop-state-pusher: No such file or directory
+ /etc/init.d/hadoop-state-pusher-control start
+ PID_FILE=/mnt/var/run/hadoop-state-pusher/hadoop-state-pusher.pid
+ LOG_FILE=/mnt/var/log/hadoop-state-pusher/hadoop-state-pusher.out
+ SVC_FILE=/mnt/var/lib/hadoop-state-pusher/run-hadoop-state-pusher
+ case $1 in
+ start
++ dirname /mnt/var/lib/hadoop-state-pusher/run-hadoop-state-pusher
+ sudo -u hadoop mkdir -p /mnt/var/lib/hadoop-state-pusher
+ echo 1
++ dirname /mnt/var/run/hadoop-state-pusher/hadoop-state-pusher.pid
+ sudo -u hadoop mkdir -p /mnt/var/run/hadoop-state-pusher
++ dirname /mnt/var/log/hadoop-state-pusher/hadoop-state-pusher.out
+ sudo -u hadoop mkdir -p /mnt/var/log/hadoop-state-pusher
+ disown %1
+ sleep 5
+ sudo -u hadoop /usr/bin/hadoop-state-pusher -server --pidfile /mnt/var/run/hadoop-state-pusher/hadoop-state-pusher.pid
+ exit 0
Command exiting with ret '0'
This is cryptic. What on earth does this mean?
It seems to have a problem with mounting something? Which of the other log files might say something informative, where I should be looking?
I tried a solution I found here, in just making the instance bigger, but this did not work, same error message.
I was looking in the wrong log file. There is a different (there were like 6?) that actually gave me some useful python debugging information. It turned out I had used a
string interpolation.format("of this kind {}, not this kind with a digit {1}".vars(a,b))
that was unsupported in python < 2.7, which was what was installed by default on the EC2 image used in elastic mapreduce.