I wrote this:
df.select(col("colname")).distinct().collect.map(_.toString()).toList
the result is
List("[2019-06-24]", "[2019-06-22]", "[2019-06-23]")
Whereas I want to get :
List("2019-06-24", "2019-06-22", "2019-06-23")
How to change this please
You need to change .map(_.toString())
to .map(_.getAs[String]("colname"))
.
With .map(_.toString())
, you are calling org.apache.spark.sql.Row.toString
, that's why the output is like List("[2019-06-24]", "[2019-06-22]", "[2019-06-23]")
.
Correct way is:
val list = df.select("colname").distinct().collect().map(_.getAs[String]("colname")).toList
Output will be:
List("2019-06-24", "2019-06-22", "2019-06-23")
Sample data:
val df=sc.parallelize(Seq(("2019-06-24"),( "2019-06-22"),("2019-06-23"))).toDF("cn")
Now select column then apply map
to get first index value then add quotes and convert to string.
df.select("cn").collect().map(x => x(0)).map(x => s""""$x"""".toString)
//res36: Array[String] = Array("2019-06-24", "2019-06-22", "2019-06-23")
(or)
df.select("cn").collect().map(x => x(0)).map(x => s""""$x"""".toString).toList
//res37: List[String] = List("2019-06-24", "2019-06-22", "2019-06-23")