Using Scala 2.10.4 and spark 1.5.1 and spark 1.6
sqlContext.sql(
"""
|select id,
|to_date(from_utc_timestamp(from_unixtime(at), 'US/Pacific')),
|from_utc_timestamp(from_unixtime(at), 'US/Pacific'),
|from_unixtime(at),
|to_date(from_unixtime(at)),
| at
|from events
| limit 100
""".stripMargin).collect().foreach(println)
Spark-Submit options:
--driver-java-options '-Duser.timezone=US/Pacific'
result:
[56d2a9573bc4b5c38453eae7,2016-02-28,2016-02-27 16:01:27.0,2016-02-28 08:01:27,2016-02-28,1456646487]
[56d2aa1bfd2460183a571762,2016-02-28,2016-02-27 16:04:43.0,2016-02-28 08:04:43,2016-02-28,1456646683]
[56d2aaa9eb63bbb63456d5b5,2016-02-28,2016-02-27 16:07:05.0,2016-02-28 08:07:05,2016-02-28,1456646825]
[56d2aab15a21fa5f4c4f42a7,2016-02-28,2016-02-27 16:07:13.0,2016-02-28 08:07:13,2016-02-28,1456646833]
[56d2aac8aeeee48b74531af0,2016-02-28,2016-02-27 16:07:36.0,2016-02-28 08:07:36,2016-02-28,1456646856]
[56d2ab1d87fd3f4f72567788,2016-02-28,2016-02-27 16:09:01.0,2016-02-28 08:09:01,2016-02-28,1456646941]
The time in US/Pacific should be 2016-02-28 00:01:27
etc but some how it subtracts "8" hours twice
For the record, here we convert Long values like that using an UDF.
For our purpose, we are interested in only the Date string representation of the timestamp (in ms since epoch in UTC)
This way, we control the parsing as well as the rendering of the dates.
after reading for sometime following are the conclusions:
from_unixtime(at)
parses the epoch time correctly, just that the printing of it as a string changes it due to timezone. It is safe to assume that thefrom_unixtime
will convert it correctly ( although printing it might show different results)from_utc_timestamp
will shift ( not just convert) the timestamp to that timezone, in this case it will subtract 8 hours to the time since (-08:00)