How to submit hadoop MR job remotely on Amazon EMR

2019-03-04 00:48发布

问题:

Current situation: I have an EMR cluster. On the master node - I have a python program that does a subprocess call and executes the script that contains the following line. The subprocess triggers the MR job and writes output to HDFS that I use later.

/usr/bin/hadoop jar test.jar testing.jobs.TestFeatureJob /in/f1.txt /in/f2.txt

What do I want to do? Now, I want to decouple this part. I want to run the python program locally on my laptop or a separate EC2 instance but still submit the MR job to the EMR cluster. Let's say I have the test.jar on the EMR Master node.

How do I submit this remotely? Also, I am using Python and let's also assume the JAR to be a black box. Is there any package that I can use to submit the jobs? Do I have to mention like an IP of Master node to be able to run this?