I need to run a Hadoop jar job using Luigi from python. I searched and found examples of writing mapper and reducer in Luigi but nothing to directly run a Hadoop jar.
I need to run a Hadoop jar compiled directly. How can I do it?
I need to run a Hadoop jar job using Luigi from python. I searched and found examples of writing mapper and reducer in Luigi but nothing to directly run a Hadoop jar.
I need to run a Hadoop jar compiled directly. How can I do it?
You need to use the
luigi.contrib.hadoop_jar
package (code).In particular, you need to extend
HadoopJarJobTask
. For example, like that:You can also include building a jar file with maven to the workflow: