how to get months,years difference between two dat

2019-01-29 07:31发布

问题:

I am getting the error:

org.apache.spark.sql.analysisexception: cannot resolve 'year'

My input data:

1,2012-07-21,2014-04-09

My code:

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
import org.apache.spark.sql.SaveMode
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
case class c (id:Int,start:String,end:String)
val c1 = sc.textFile("date.txt")
val c2 = c1.map(_.split(",")).map(r=>(c(r(0).toInt,r(1).toString,r(2).toString)))
val c3 = c2.toDF();
c3.registerTempTable("c4")
val r = sqlContext.sql("select id,datediff(year,to_date(end), to_date(start)) AS date from c4")

What can I do resolve above error?

I have tried the following code but I got the output in days and I need it in years

val r = sqlContext.sql("select id,datediff(to_date(end), to_date(start)) AS date from c4")

Please advise me if i can use any function like to_date to get year difference.

回答1:

val r = sqlContext.sql("select id,datediff(year,to_date(end), to_date(start)) AS date from c4")

In the above code, "year" is not a column in the data frame i.e it is not a valid column in table "c4" that is why analysis exception is thrown as query is invalid, query is not able to find the "year" column.

Use Spark User Defined Function (UDF), that will be a more robust approach.



回答2:

Another simple way to cast the string to dateType in spark sql and apply sql dates and time functions on the columns like following :

import org.apache.spark.sql.types._
val c4 = c3.select(col("id"),col("start").cast(DateType),col("end").cast(DateType))

c4.withColumn("dateDifference", datediff(col("end"),col("start")))
  .withColumn("monthDifference", months_between(col("end"),col("start")))
  .withColumn("yearDifference", year(col("end"))-year(col("start")))
  .show()