Is there a way to take the first 1000 rows of a Sp

2020-02-17 04:55发布

站内文章 / Spark

31 0

放荡不羁爱自由

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am using the randomSplitfunction to get a small amount of a dataframe to use in dev purposes and I end up just taking the first df that is returned by this function.

val df_subset = data.randomSplit(Array(0.00000001, 0.01), seed = 12345)(0)

If I use df.take(1000) then I end up with an array of rows- not a dataframe, so that won't work for me.

Is there a better, simpler way to take say the first 1000 rows of the df and store it as another df?

回答1:

The method you are looking for is .limit.

Returns a new Dataset by taking the first n rows. The difference between this function and head is that head returns an array while limit returns a new Dataset.