Convert Date of Birth into Age in Spark Dataframe

2019-02-20 05:40发布

问题:

This seems simple but I couldn't find the answer. I'm trying to convert a column of date-of-birth in the below date format to the date format in Spark Dataframe API and then calculate the corresponding ages. I probably need the system dates as well. I have found some java libraries that may be useful but I am still having some difficulties in using it with dataframe api.

23-AUG-67
28-FEB-66
09-APR-59

9/10/2015 Edit: I just found Spark 1.5.0 adds "Date Time Functions" which will be helpful in the future when 1.5.0 is released here. Unfortunately, It doesn't work with the current spark version in AWS EMR.

9/10/2015 Evening Edit: I was able to convert the date of birth into age using the below code.
Note the getYear() function is deprecated but as I can tell they work fine.

import java.sql.Date
import java.text.SimpleDateFormat
import org.apache.spark.sql.SQLContext

val sqlsc= new SQLContext(sc)

val epoch = System.currentTimeMillis
val curDate = new Date(epoch)
val dtFormat = new SimpleDateFormat("dd-MMM-yy")

val dobToAge = udf( (dob: String) => {
  val javaUtilDate = dtFormat.parse(dob)
  val sqlDate = new Date(javaUtilDate.getTime())
  curDate.getYear - sqlDate.getYear
})

inputdata.withColumn("AGE", dobToAge('dob))

回答1:

Instead of using the deprecated getXXX methods of java.util.Date, you should rather use java.util.Calendar.

Also your solution doesn't work in all cases. If someone is born on December 31st 1976. His age will be computed as 2015-1976 = 39 even though on January, 1st 2015 he won't be 39 for almost a full year.

You should rather use a computation as shown in: http://howtodoinjava.com/2014/05/26/java-code-to-calculate-age-from-date-of-birth/ (converting the Java code to Scala shouldn't be much of a problem).



回答2:

I was able to convert the date of birth column to age using udf with sql date format. Please see the edit for details.



回答3:

select datediff(current_date(),
         TO_DATE(CAST(UNIX_TIMESTAMP(dateOfBirth,'yyyy-MM-dd') AS TIMESTAMP)))/365 as age
  from <TABLE_NAME>


回答4:

LocalDate birthdate = new LocalDate (1970, 1, 20);
LocalDate now = new LocalDate();
Years age = Years.yearsBetween(birthdate, now);