I'm trying to install sqoop 2 (version 1.99.3) on an Amazon EMR cluster (AMI version 3.2.0 / Hadoop version 2.4.0). When I start the sqoop server, I see this error in localhost.log:
Sep 10, 2014 4:55:56 PM org.apache.catalina.core.StandardContext listenerStart
SEVERE: Exception sending context initialized event to listener instance of class org.apache.sqoop.server.ServerInitializer
java.lang.RuntimeException: Failure in server initialization
at org.apache.sqoop.core.SqoopServer.initialize(SqoopServer.java:57)
at org.apache.sqoop.server.ServerInitializer.contextInitialized(ServerInitializer.java:36)
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4206)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4705)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601)
at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:943)
at org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:778)
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:504)
at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317)
at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324)
at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065)
at org.apache.catalina.core.StandardHost.start(StandardHost.java:840)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057)
at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463)
at org.apache.catalina.core.StandardService.start(StandardService.java:525)
at org.apache.catalina.core.StandardServer.start(StandardServer.java:754)
at org.apache.catalina.startup.Catalina.start(Catalina.java:595)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
Caused by: org.apache.sqoop.common.SqoopException: MAPREDUCE_0002:Failure on submission engine initialization
at org.apache.sqoop.submission.mapreduce.MapreduceSubmissionEngine.initialize(MapreduceSubmissionEngine.java:115)
at org.apache.sqoop.framework.JobManager.initialize(JobManager.java:215)
at org.apache.sqoop.core.SqoopServer.initialize(SqoopServer.java:53)
... 25 more
Caused by: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470)
at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:449)
at org.apache.sqoop.submission.mapreduce.MapreduceSubmissionEngine.initialize(MapreduceSubmissionEngine.java:113)
... 27 more
Here's what I've done, per the installation instructions. Note that with EMR, $HADOOP_HOME is /home/hadoop.
- I downloaded sqoop-1.99.3-bin-hadoop200.tar.gz from Apache and extracted it into $HADOOP_HOME/sqoop.
- added the following to the common.loader property in catalina.properties: /home/hadoop/share/hadoop/common/*.jar,/home/hadoop/share/hadoop/common/lib/*.jar,/home/hadoop/share/hadoop/mapreduce/*.jar,/home/hadoop/share/hadoop/yarn/*.jar
- In sqoop.properties:
- I replaced @LOGDIR@ with /home/hadoop/sqoop/log
- I replaced @BASEDIR@ with /home/hadoop/sqoop
- for the property org.apache.sqoop.submission.engine.mapreduce.configuration.directory, I replaced /etc/hadoop/conf/ with /home/hadoop/conf/
- And then I started the server: bin/sqoop.sh server start
As far as I can tell from the error, the source of the problem is this line:
Caused by: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
and this property is set in mapred-site.xml:
<property><name>mapreduce.framework.name</name><value>yarn</value></property>
and this is what I want it to be.
I feel like I'm missing some parameter setting in the installation/configuration of sqoop, and I've tried a few other things (such as trying AMI 3.0.4 / Hadoop 2.2.0) but have not been able to start the sqoop server.
fyi. I've read through this post on Kyle Mulka's blog, but it references different versions of Hadoop and sqoop and doesn't appear to provide insight into my configuration. And I've read a few other pages on this site but haven't yet found one that references the Hadoop and sqoop versions I'm using. And I've seen this configuration running with Cloudera (sqoop2 and Hadoop 2 with yarn), though haven't been able to figure out how that would translate to an EMR installation.
Thanks to feedback on this post, I changed the common.loader property in catalina.properties from what I had before:
to this:
After that, the sqoop server started successfully.