I am using Isolated mode of zeppelins spark interpreter, with this mode it will start a new job for each notebook in spark cluster. I want to kill the job via zeppelin when the notebook execution is completed. For this I did sc.stop
this stopped the sparkContext and the job is also stopped from spark cluster. But next time when I try to run the notebook its not starting the sparkContext
again. So how to do that?
相关问题
- How to maintain order of key-value in DataFrame sa
- Spark on Yarn Container Failure
- In Spark Streaming how to process old data and del
- Filter from Cassandra table by RDD values
- Spark 2.1 cannot write Vector field on CSV
相关文章
- Livy Server: return a dataframe as JSON?
- SQL query Frequency Distribution matrix for produc
- How to filter rows for a specific aggregate with s
- How to name file when saveAsTextFile in spark?
- Spark save(write) parquet only one file
- Could you give me any clue Why 'Cannot call me
- Why does the Spark DataFrame conversion to RDD req
- How do I enable partition pruning in spark
While working with Zeppelin and Spark I also stumbled upon the same problem and made some investigations. After some time, my first conclusion was that:
sc.stop()
in a paragraphrestart
button)However, since the UI allows restarting the Spark Interpreter via a button press, why not just reverse engineer the API call of the
restart
button! The result was, thatrestarting
the Spark Interpreter sends the following HTTP request:Fortunately, Zeppelin has the ability to work with multiple interpreters, where one of them is also a
shell
Interpreter. Therefore, i created two paragraphs:The first paragraph was for stopping the SparkContext whenever needed:
The second paragraph was for restarting the SparkContext programmatically:
After stopping and restarting the SparkContext with the two paragraphs, I run another paragraph to check if restarting worked...and it worked! So while this is no official solution and more of a workaround, it is still legit as we do nothing else than "pressing" the
restart
button within a paragraph!Zeppelin version: 0.8.1
I'm investigated the problem why sc stop in spark in yarn-client. I find that it's the problem of spark itself(Spark version >=1.6). In spark client mode, the AM connect to the Driver via RPC connection, there are two connections. It setup NettyRpcEndPointRef to connect to the driver's service 'YarnSchedulerBackEnd' of server 'SparkDriver', and other another connection is EndPoint 'YarnAM'.
In these RPC connections between AM and Driver ,there are no heartbeats. So the only way AM know the Driver is connectted or not is that the OnDisconnected method in EndPoint 'YarnAM'. The disconnect message of driver and AM connetcion though NettyRpcEndPointRef will 'postToAll' though RPCHandler to the EndPoint 'YarnAM'. When the TCP connetion between them disconnected, or keep alive message find the tcp not alive(2 hours maybe in Linux system), it will mark the application SUCCESS.
So when the Driver Monitor Process find the yarn application state change to SUCCESS, it will stop the sc.
So the root cause is that , in Spark client, there are no retry connect to the driver to check the driver is live or not,but just mark the yarn application as quick as possible.Maybe Spark can modify this issue.
You can restart the interpreter for the notebook in the interpreter bindings (gear in upper right hand corner) by clicking on the restart icon to the left of the interpreter in question (in this case it would be the spark interpreter).
It's a bit counter intuitive but you need to access the interpreter menu tab instead of stopping
SparkContext
directly:go to interpreter list.
find Spark interpreter and click restart in the right upper corner: