I am running kinesis plus spark application https://spark.apache.org/docs/1.2.0/streaming-kinesis-integration.html
I am running as below
command on ec2 instance :
./spark/bin/spark-submit --class org.apache.spark.examples.streaming.myclassname --master yarn-cluster --num-executors 2 --driver-memory 1g --executor-memory 1g --executor-cores 1 /home/hadoop/test.jar
I have installed spark on EMR.
EMR details
Master instance group - 1 Running MASTER m1.medium
1
Core instance group - 2 Running CORE m1.medium
I am getting below INFO and it never ends.
15/06/14 11:33:23 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers
15/06/14 11:33:23 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (2048 MB per container)
15/06/14 11:33:23 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
15/06/14 11:33:23 INFO yarn.Client: Setting up container launch context for our AM
15/06/14 11:33:23 INFO yarn.Client: Preparing resources for our AM container
15/06/14 11:33:24 INFO yarn.Client: Uploading resource file:/home/hadoop/.versions/spark-1.3.1.e/lib/spark-assembly-1.3.1-hadoop2.4.0.jar -> hdfs://172.31.13.68:9000/user/hadoop/.sparkStaging/application_1434263747091_0023/spark-assembly-1.3.1-hadoop2.4.0.jar
15/06/14 11:33:29 INFO yarn.Client: Uploading resource file:/home/hadoop/test.jar -> hdfs://172.31.13.68:9000/user/hadoop/.sparkStaging/application_1434263747091_0023/test.jar
15/06/14 11:33:31 INFO yarn.Client: Setting up the launch environment for our AM container
15/06/14 11:33:31 INFO spark.SecurityManager: Changing view acls to: hadoop
15/06/14 11:33:31 INFO spark.SecurityManager: Changing modify acls to: hadoop
15/06/14 11:33:31 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/06/14 11:33:31 INFO yarn.Client: Submitting application 23 to ResourceManager
15/06/14 11:33:31 INFO impl.YarnClientImpl: Submitted application application_1434263747091_0023
15/06/14 11:33:32 INFO yarn.Client: Application report for application_1434263747091_0023 (state: ACCEPTED)
15/06/14 11:33:32 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1434281611893
final status: UNDEFINED
tracking URL: http://172.31.13.68:9046/proxy/application_1434263747091_0023/
user: hadoop
15/06/14 11:33:33 INFO yarn.Client: Application report for application_1434263747091_0023 (state: ACCEPTED)
15/06/14 11:33:34 INFO yarn.Client: Application report for application_1434263747091_0023 (state: ACCEPTED)
15/06/14 11:33:35 INFO yarn.Client: Application report for application_1434263747091_0023 (state: ACCEPTED)
15/06/14 11:33:36 INFO yarn.Client: Application report for application_1434263747091_0023 (state: ACCEPTED)
15/06/14 11:33:37 INFO yarn.Client: Application report for application_1434263747091_0023 (state: ACCEPTED)
15/06/14 11:33:38 INFO yarn.Client: Application report for application_1434263747091_0023 (state: ACCEPTED)
15/06/14 11:33:39 INFO yarn.Client: Application report for application_1434263747091_0023 (state: ACCEPTED)
15/06/14 11:33:40 INFO yarn.Client: Application report for application_1434263747091_0023 (state: ACCEPTED)
15/06/14 11:33:41 INFO yarn.Client: Application report for application_1434263747091_0023 (state: ACCEPTED)
Could somebody please let me know as why it's not working ?
There are three ways we can try to fix this issue.
Do
Take all the process id's with spark processes and kill them, like
To check this, do
you will get an output similar to this:
Check for the application id's, if they are more than 1, or more than 2, kill them. Your cluster cannot run more than 2 spark applications at the same time. I am not 100% sure about this, but on cluster if you run more than two spark applications, it will start complaining. So, kill them Do this to kill them:
I got this error in this situation:
Logs for container_1453825604297_0001_02_000001 (from ResourceManager web UI):
I workaround it by using yarn cluster mode: MASTER=yarn-cluster.
On another computer which is configured in the similar way, but is's IP is reachable from the cluster, both yarn-client and yarn-cluster work.
Others may encounter this error for different reasons, and my point is that checking error logs (not seen from terminal, but ResourceManager web UI in this case) almost always helps.
I had the same problem on a local hadoop cluster with spark 1.4 and yarn, trying to run spark-shell. It had more then enough resources.
What helped was running the same thing from an interactive lsf job on the cluster. So perhaps there were some network limitations to run yarn from the head node...
I had this exact problem when multiple users were trying to run on our cluster at once. The fix was to change setting of the scheduler.
In the file
/etc/hadoop/conf/capacity-scheduler.xml
we changed the propertyyarn.scheduler.capacity.maximum-am-resource-percent
from0.1
to0.5
.Changing this setting increases the fraction of the resources that is made available to be allocated to application masters, increasing the number of masters possible to run at once and hence increasing the number of possible concurrent applications.
I hit the same problem MS Azure cluster in their HDinsight spark cluster.
finally found out the issue was the cluster couldn't be able to talk back to the driver. I assume you used client mode when submit the job since you can provide this debug log.
reason why is that spark executors have to talk to driver program, and the TCP connection has to be bi-directional. so if your driver program is running in a VM(ec2 instance) which is not reachable via hostname or IP(you have to specify in spark conf, default to hostname), your status will be accepted forever.
When running with yarn-cluster all the application logging and stdout will be located in the assigned yarn application master and will not appear to spark-submit. Also being streaming the application usually does not exit. Check the Hadoop resource manager web interface and look at the Spark web ui and logs that will be available from the Hadoop ui.