Is it possible to carry out hive steps using boto 3? I have been doing so using AWS CLI, but from the docs (http://boto3.readthedocs.org/en/latest/reference/services/emr.html#EMR.Client.add_job_flow_steps), it seems like only jars are accepted. If Hive steps are possible, where are the resources?
Thanks
In the previous version of Boto, there was a helper class named
HiveStep
which made it easy to construct the a job flow step for executing a Hive job. However in Boto3, the approach has changed and the classes are generated at run-time from the AWS REST API. As a result, no such helper class exists. Looking at the source code ofHiveStep
, https://github.com/boto/boto/blob/2d7796a625f9596cbadb7d00c0198e5ed84631ed/boto/emr/step.py it can be seen that this is a subclass ofStep
, which is a class with propertiesjar
args
andmainclass
, very similar to the requirments in Boto3.It turns out, all job flow steps on EMR, including Hive ones, still need to be instantiated from a JAR. Therefore you can execute Hive steps through Boto3, but there is no helper class to make it easy to construct the definition.
By looking at the approach used by
HiveStep
in the previous version of Boto, you could construct a valid job flow definition.Or, you could fall back to using the previous version of Boto.
I was able to get this to work using Boto3: