What does “Stage Skipped” mean in Apache Spark web

2019-01-03 09:21发布

From my Spark UI. What does it mean by skipped?

enter image description here

1条回答
时光不老,我们不散
2楼-- · 2019-01-03 10:17

Typically it means that data has been fetched from cache and there was no need to re-execute given stage. It is consistent with your DAG which shows that the next stage requires shuffling (reduceByKey). Whenever there is shuffling involved Spark automatically caches generated data:

Shuffle also generates a large number of intermediate files on disk. As of Spark 1.3, these files are preserved until the corresponding RDDs are no longer used and are garbage collected. This is done so the shuffle files don’t need to be re-created if the lineage is re-computed.

查看更多
登录 后发表回答