I'm setting up an Apache Spark
cluster to perform realtime streaming computations and would like to monitor the performance of the deployment by tracking various metrics like sizes of batches, batch processing times, etc. My Spark Streaming
program is written in Scala
Questions
- The Spark monitoring REST API description lists the various endpoints available. However, I couldn't find endpoints that expose batch-level info. Is there a way to get a list of all the Spark batches that have been run for an application and other per-batch details such as follows:
- Number of events per batch
- Processing time
- Scheduling delay
- Exit status: ie, whether the batch was processed successfully or not
- In case such a batch-level API is unavailable, can batch-level stats (eg: size, processing time, scheduling delay, etc.) be obtained by adding custom instrumentation to the spark streaming program.
Thanks in advance,
If you have no luck with 1., this will help with 2.:
Taken from In Spark Streaming, is there a way to detect when a batch has finished?
batchCompleted.batchInfo()
contains:numRecords
batchTime
,processsingStartTime
,processingEndTime
schedulingDelay
outputOperationInfos
Hopefully you can get what you need from those properties.