Spark SQL is not converting timezone correctly [du

2020-07-06 05:19发布

问题:

Using Scala 2.10.4 and spark 1.5.1 and spark 1.6

sqlContext.sql(
  """
    |select id,
    |to_date(from_utc_timestamp(from_unixtime(at), 'US/Pacific')),
    |from_utc_timestamp(from_unixtime(at), 'US/Pacific'),
    |from_unixtime(at),
    |to_date(from_unixtime(at)),
    | at
    |from events
    | limit 100
  """.stripMargin).collect().foreach(println)

Spark-Submit options: --driver-java-options '-Duser.timezone=US/Pacific'

result:

[56d2a9573bc4b5c38453eae7,2016-02-28,2016-02-27 16:01:27.0,2016-02-28 08:01:27,2016-02-28,1456646487]
[56d2aa1bfd2460183a571762,2016-02-28,2016-02-27 16:04:43.0,2016-02-28 08:04:43,2016-02-28,1456646683]
[56d2aaa9eb63bbb63456d5b5,2016-02-28,2016-02-27 16:07:05.0,2016-02-28 08:07:05,2016-02-28,1456646825]
[56d2aab15a21fa5f4c4f42a7,2016-02-28,2016-02-27 16:07:13.0,2016-02-28 08:07:13,2016-02-28,1456646833]
[56d2aac8aeeee48b74531af0,2016-02-28,2016-02-27 16:07:36.0,2016-02-28 08:07:36,2016-02-28,1456646856]
[56d2ab1d87fd3f4f72567788,2016-02-28,2016-02-27 16:09:01.0,2016-02-28 08:09:01,2016-02-28,1456646941]

The time in US/Pacific should be 2016-02-28 00:01:27 etc but some how it subtracts "8" hours twice

回答1:

after reading for sometime following are the conclusions:

  • Spark-Sql doesn't support date-time, and nor timezones
  • Using timestamp is the only solution
  • from_unixtime(at) parses the epoch time correctly, just that the printing of it as a string changes it due to timezone. It is safe to assume that the from_unixtime will convert it correctly ( although printing it might show different results)
  • from_utc_timestamp will shift ( not just convert) the timestamp to that timezone, in this case it will subtract 8 hours to the time since (-08:00)
  • printing sql results messes up the times with respect to timezone param


回答2:

For the record, here we convert Long values like that using an UDF.

For our purpose, we are interested in only the Date string representation of the timestamp (in ms since epoch in UTC)

val udfToDateUTC = udf((epochMilliUTC: Long) => {
  val dateFormatter = java.time.format.DateTimeFormatter.ofPattern("yyyy-MM-dd").withZone(java.time.ZoneId.of("UTC"))
  dateFormatter.format(java.time.Instant.ofEpochMilli(epochMilliUTC))
})

This way, we control the parsing as well as the rendering of the dates.