What does the meaning of the number in the bracket after rdd?
相关问题
- How to maintain order of key-value in DataFrame sa
- Spark on Yarn Container Failure
- In Spark Streaming how to process old data and del
- Filter from Cassandra table by RDD values
- Spark 2.1 cannot write Vector field on CSV
相关文章
- Livy Server: return a dataframe as JSON?
- SQL query Frequency Distribution matrix for produc
- How to filter rows for a specific aggregate with s
- How to name file when saveAsTextFile in spark?
- Spark save(write) parquet only one file
- Could you give me any clue Why 'Cannot call me
- Why does the Spark DataFrame conversion to RDD req
- How do I enable partition pruning in spark
The number after RDD is its identifier:
It is used to track RDD across the session, for example for purposes like
caching
:This number is simple an incremental integer (
nextRddId
is just anAtomicInteger
):generated when RDD is constructed:
so if we followed:
you'll see 2 and 3, and if you execute
you'd expect 4, which can confirmed by checking the UI:
We can also see that
join
creates a few newRDDs
under the covers (5
and6
).