Now I have 300+ columns in my RDD, but I found there is a need to dynamically select a range of columns and put them into LabledPoints data type. As a newbie to Spark, I am wondering if there is any index way to select a range of columns in RDD. Something like temp_data = data[, 101:211]
in R. Is there something like val temp_data = data.filter(_.column_index in range(101:211)...
?
Any thought is welcomed and appreciated.
If it is a DataFrame, then something like this should work:
Kind of old thread, but I recently had to do something similar and search around. I needed to select all but the last column where I had 200+ columns.
Spark 1.4.1
Scala 2.10.4
Assuming you have an RDD of
Array
or any other scala collection (e.g.,List
). You can do something like this: