Is there a better way?
val mean = df.select(avg("date")).first().getDouble(0)
df.withColumn("mean", lit(mean))
I assume that it could worth it to avoid calling an action …
Is there a better way?
val mean = df.select(avg("date")).first().getDouble(0)
df.withColumn("mean", lit(mean))
I assume that it could worth it to avoid calling an action …
It is possible to avoid additional action using broadcast
with cross product:
import org.apache.spark.sql.functions.broadcast
df.crossJoin(broadcast(df.agg(avg("date"))))
or:
spark.conf.set("spark.sql.crossJoin.enabled", true)
df.join(broadcast(df.agg(avg("date"))))
What you shouldn't do is using window functions:
df.withColumn("avg", avg("date").over())