Change column value in a dataframe spark scala

2019-09-22 03:03发布

问题:

This is how my dataframe looks like at the moment

+------------+
|    DATE    |
+------------+
|    19931001|
|    19930404|
|    19930603|
|    19930805|
+------------+

I am trying to reformat this string value to yyyy-mm-dd hh:mm:ss.fff and keep it as a string not a date type or time stamp.

How would I do that using the withColumn method ?

回答1:

Here is the solution using UDF and withcolumn, I have assumed that you have a string date field in Dataframe

//Create dfList dataframe
  val dfList = spark.sparkContext
    .parallelize(Seq("19931001","19930404", "19930603", "19930805")).toDF("DATE")


  dfList.withColumn("DATE", dateToTimeStamp($"DATE")).show()

  val dateToTimeStamp = udf((date: String) => {
    val stringDate = date.substring(0,4)+"/"+date.substring(4,6)+"/"+date.substring(6,8)
    val format = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss")
    format.format(new SimpleDateFormat("yyy/MM/dd").parse(stringDate))
  })


回答2:

withClumn("date",
      from_unixtime(unix_timestamp($"date", "yyyyMMdd"), "yyyy-MM-dd hh:mm:ss.fff") as "date")

this should work. Another notice is the that mm gives minutes and MM gives months, hope this help you.



回答3:

First, I created this DF:

val df = sc.parallelize(Seq("19931001","19930404","19930603","19930805")).toDF("DATE")

For date management we are going to use joda time Library (don't forget to join the joda-time.jar file)

import org.joda.time.format.DateTimeFormat
import org.joda.time.format.DateTimeFormatter 

def func(s:String):String={ 
val dateFormat = DateTimeFormat.forPattern("yyyymmdd");
val resultDate = dateFormat.parseDateTime(s);
return resultDate.toString();
}

Finally, apply the function to dataframe:

val temp = df.map(l => func(l.get(0).toString()))
val df2 = temp.toDF("DATE")
df2.show()

This answer still needs some work, me myself is new to spark, but it is getting the job done, I think!