I found this link https://gist.github.com/BenFradet/c47c5c7247c5d5d0f076 which shows an implementation where in spark, broadcast variable is being updated. Is this a valid implementation meaning will executors see the latest value of broadcast variable?
相关问题
- How to maintain order of key-value in DataFrame sa
- Spark on Yarn Container Failure
- In Spark Streaming how to process old data and del
- Filter from Cassandra table by RDD values
- Spark 2.1 cannot write Vector field on CSV
相关文章
- Livy Server: return a dataframe as JSON?
- How are custom broadcast events implemented in Jav
- SQL query Frequency Distribution matrix for produc
- How to filter rows for a specific aggregate with s
- How to name file when saveAsTextFile in spark?
- Spark save(write) parquet only one file
- Could you give me any clue Why 'Cannot call me
- Why does the Spark DataFrame conversion to RDD req
The code you are referring to is using Broadcast.unpersist() method. If you check Spark API Broadcast.unpersist() method it says "Asynchronously delete cached copies of this broadcast on the executors. If the broadcast is used after this is called, it will need to be re-sent to each executor." There is an overloaded method unpersist(boolean blocking) which will block until unpersisting has completed. So it depends how are you using Broadcast variable in your Spark application. In spark there is no auto-re-broadcast if you mutate a broadcast variable. Driver has to resend it. Spark documentation says you shouldn't modify broadcast variable (Immutable) to avoid any inconsistency in processing at executor nodes but there are unpersist() and destroy() methods available if you want to control the broadcast variable's life cycle. Please refer spark jira https://issues.apache.org/jira/browse/SPARK-6404