How to convert Timestamp to Date format in DataFra

2019-03-18 08:04发布

问题:

I have a DataFrame with Timestamp column, which i need to convert as Date format.

Is there any Spark SQL functions available for this?

回答1:

You can cast the column to date:

Scala:

import org.apache.spark.sql.types.DateType

val newDF = df.withColumn("dateColumn", df("timestampColumn").cast(DateType))

Pyspark:

df = df.withColumn('dateColumn', df['timestampColumn'].cast('date'))


回答2:

In SparkSQL:

SELECT
  CAST(the_ts AS DATE) AS the_date
FROM the_table


回答3:

Imagine the following input:

val dataIn = spark.createDataFrame(Seq(
        (1, "some data"),
        (2, "more data")))
    .toDF("id", "stuff")
    .withColumn("ts", current_timestamp())

dataIn.printSchema
root
 |-- id: integer (nullable = false)
 |-- stuff: string (nullable = true)
 |-- ts: timestamp (nullable = false)

You can use the to_date function:

val dataOut = dataIn.withColumn("date", to_date($"ts"))

dataOut.printSchema
root
 |-- id: integer (nullable = false)
 |-- stuff: string (nullable = true)
 |-- ts: timestamp (nullable = false)
 |-- date: date (nullable = false)

dataOut.show(false)
+---+---------+-----------------------+----------+
|id |stuff    |ts                     |date      |
+---+---------+-----------------------+----------+
|1  |some data|2017-11-21 16:37:15.828|2017-11-21|
|2  |more data|2017-11-21 16:37:15.828|2017-11-21|
+---+---------+-----------------------+----------+

I would recommend preferring these methods over casting and plain SQL.