This seems simple but I couldn't find the answer. I'm trying to convert a column of date-of-birth in the below date format to the date format in Spark Dataframe API and then calculate the corresponding ages. I probably need the system dates as well. I have found some java libraries that may be useful but I am still having some difficulties in using it with dataframe api.
23-AUG-67
28-FEB-66
09-APR-59
9/10/2015 Edit: I just found Spark 1.5.0 adds "Date Time Functions" which will be helpful in the future when 1.5.0 is released here. Unfortunately, It doesn't work with the current spark version in AWS EMR.
9/10/2015 Evening Edit:
I was able to convert the date of birth into age using the below code.
Note the getYear()
function is deprecated but as I can tell they work fine.
import java.sql.Date
import java.text.SimpleDateFormat
import org.apache.spark.sql.SQLContext
val sqlsc= new SQLContext(sc)
val epoch = System.currentTimeMillis
val curDate = new Date(epoch)
val dtFormat = new SimpleDateFormat("dd-MMM-yy")
val dobToAge = udf( (dob: String) => {
val javaUtilDate = dtFormat.parse(dob)
val sqlDate = new Date(javaUtilDate.getTime())
curDate.getYear - sqlDate.getYear
})
inputdata.withColumn("AGE", dobToAge('dob))