From the SparkR shell, I'd like to generate a link to view the Spark UI while in Yarn mode. Normally the Spark UI is at port 4040, but in Yarn mode apparently it is at something like [host]:9046/proxy/application_1234567890123_0001/
, where the last part of the path is the unique applicationId.
Other SO answers show how to get the applicationID for the Scala and Python shells. How do we get the applicationID from SparkR?
As a stab in the dark I tried SparkR:::callJMethod(sc, "applicationId")
, but it didn't work.
I also tried something along the lines of system("yarn application -list")
, but that doesn't seem to work from RStudio and has other limitations.
After creating the spark session you can do the following to get the Spark application id.
You can directly follow the link from the YARN web UI to get to the Spark UI. From the YARN web UI at port 8088 you can click on 'Running Applications' and that should show you a link to the Application status page.
If you want to use
callJMethod
to get the application id you can use something likeSparkR:::callJMethod(SparkR:::callJMethod(sc, "sc"), "applicationId")
.The reason we need this nested call to
sc
is becausesc
is a JavaSparkContext handle andapplicationId
is only available in the Scala SparkContext.