可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I am trying to convert a column which is in String format to Date format using the to_date
function but its returning Null values.
df.createOrReplaceTempView("incidents")
spark.sql("select Date from incidents").show()
+----------+
| Date|
+----------+
|08/26/2016|
|08/26/2016|
|08/26/2016|
|06/14/2016|
spark.sql("select to_date(Date) from incidents").show()
+---------------------------+
|to_date(CAST(Date AS DATE))|
+---------------------------+
| null|
| null|
| null|
| null|
The Date column is in String format:
|-- Date: string (nullable = true)
回答1:
Use to_date
with Java SimpleDateFormat
.
TO_DATE(CAST(UNIX_TIMESTAMP(date, 'MM/dd/yyyy') AS TIMESTAMP))
Example:
spark.sql("""
SELECT TO_DATE(CAST(UNIX_TIMESTAMP('08/26/2016', 'MM/dd/yyyy') AS TIMESTAMP)) AS newdate"""
).show()
+----------+
| dt|
+----------+
|2016-08-26|
+----------+
回答2:
I solved the same problem without the temp table/view and with dataframe functions.
Of course I found that only one format works with this solution and that's yyyy-MM-DD
.
For example:
val df = sc.parallelize(Seq("2016-08-26")).toDF("Id")
val df2 = df.withColumn("Timestamp", (col("Id").cast("timestamp")))
val df3 = df2.withColumn("Date", (col("Id").cast("date")))
df3.printSchema
root
|-- Id: string (nullable = true)
|-- Timestamp: timestamp (nullable = true)
|-- Date: date (nullable = true)
df3.show
+----------+--------------------+----------+
| Id| Timestamp| Date|
+----------+--------------------+----------+
|2016-08-26|2016-08-26 00:00:...|2016-08-26|
+----------+--------------------+----------+
The timestamp of course has 00:00:00.0
as a time value.
回答3:
Since your main aim was to convert the type of a column in a DataFrame from String to Timestamp, I think this approach would be better.
import org.apache.spark.sql.functions.{to_date, to_timestamp}
val modifiedDF = DF.withColumn("Date", to_date($"Date", "MM/dd/yyyy"))
You could also use to_timestamp
(I think this is available from Spark 2.x) if you require fine grained timestamp.
回答4:
you can also do this query...!
sqlContext.sql("""
select from_unixtime(unix_timestamp('08/26/2016', 'MM/dd/yyyy'), 'yyyy:MM:dd') as new_format
""").show()
![](https://www.manongdao.com/static/images/pcload.jpg)
回答5:
You can also pass date format
df.withColumn("Date",to_date(unix_timestamp(df.col("your_date_column"), "your_date_format").cast("timestamp")))
For Example
import org.apache.spark.sql.functions._
val df = sc.parallelize(Seq("06 Jul 2018")).toDF("dateCol")
df.withColumn("Date",to_date(unix_timestamp(df.col("dateCol"), "dd MMM yyyy").cast("timestamp")))
回答6:
dateID is int column contains date in Int format
spark.sql("SELECT from_unixtime(unix_timestamp(cast(dateid as varchar(10)), 'yyyymmdd'), 'yyyy-mm-dd') from XYZ").show(50, false)
回答7:
The solution proposed above by Sai Kiriti Badam worked for me.
I'm using Azure Databricks to read data captured from an EventHub. This contains a string column named EnqueuedTimeUtc with the following format...
12/7/2018 12:54:13 PM
I'm using a Python notebook and used the following...
import pyspark.sql.functions as func
sports_messages = sports_df.withColumn("EnqueuedTimestamp", func.to_timestamp("EnqueuedTimeUtc", "MM/dd/yyyy hh:mm:ss aaa"))
... to create a new column EnqueuedTimestamp of type "timestamp" with data in the following format...
2018-12-07 12:54:13