可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have some unix times that I convert to timestamps in sparklyr and for some reasons I also need to convert them into strings.

Unfortunately, it seems that during the conversion to string hive converts to EST (my locale).

df_new <- spark_read_parquet(sc, "/mypath/parquet_*",
                             overwrite = TRUE,
                             name = "df_new",
                             memory = FALSE,
                             options = list(mergeSchema = "true"))
> df_new %>%  
           mutate(unix_t = from_utc_timestamp(timestamp(t) ,'UTC'),
           date_str = date_format(unix_t, 'yyyy-MM-dd HH:mm:ss z'),
           date_alt = to_date(from_utc_timestamp(timestamp(t) ,'UTC'))) %>% 
    select(t, unix_t, date_str, date_alt) %>% head(5)
# Source:   lazy query [?? x 4]
# Database: spark_connection
            t unix_t              date_str                date_alt  
        <dbl> <dttm>              <chr>                   <date>    
1 1419547405. 2014-12-25 22:43:25 2014-12-25 17:43:25 EST 2014-12-25
2 1418469714. 2014-12-13 11:21:54 2014-12-13 06:21:54 EST 2014-12-13
3 1419126103. 2014-12-21 01:41:43 2014-12-20 20:41:43 EST 2014-12-20
4 1419389856. 2014-12-24 02:57:36 2014-12-23 21:57:36 EST 2014-12-23
5 1418271811. 2014-12-11 04:23:31 2014-12-10 23:23:31 EST 2014-12-10

As you can see both date_str and date_alt use the EST timezone. I need UTC here. How can I do that?

Thanks!

回答1:

From the Hive function reference, date_format uses Java's SimpleDateFormat, which I believe always defaults to the JVM time zone, this explains why this gets you a character string converted to your time zone.

One option is to detect the time zone and manually add the hours to get UTC.

Another option would be to use lubridate with spark_apply():

sdf_len(sc, 1) %>%
  mutate(unix_t = from_utc_timestamp(timestamp(1522371003) , 'UDT')) %>%
  spark_apply(
    function(e) {
      dplyr::mutate(
        e,
        time_str = as.character(
          lubridate::with_tz(
            as.POSIXct(unix_t, origin="1970-01-01"),
            "GMT"
          )
        )
      )
    },
    columns = c("id", "unix_t", "time_str"))

回答2:

It's possible that sparklyr is doing some weird translation of timezones into the hive functions. I'd try registering the dataframe as a table and doing the manipulation with pure HQL:

createOrReplaceTempView(df_new, "df_new")
result <- sql("select from_utc_timestamp(timestamp(t) ,'UTC'),
   cast(from_utc_timestamp(timestamp(t) ,'UTC') as STRING),
   cast(from_utc_timestamp(timestamp(t) ,'UTC') as DATE)
   from df_new")
head(result)

edit

If you're unfamiliar with SQL-languages, you can add any of the variables from df_new as a comma separated list like so (and rename your selections with as)

select var1, var2, t,
   from_utc_timestamp(timestamp(t) ,'UTC') as unix_t,
   cast(from_utc_timestamp(timestamp(t) ,'UTC') as STRING) as date_str,
   cast(from_utc_timestamp(timestamp(t) ,'UTC') as DATE) as date_alt
   from df_new

You can also use * to represent all variables from the data frame:

select *,
   from_utc_timestamp(timestamp(t) ,'UTC') as unix_t,
   cast(from_utc_timestamp(timestamp(t) ,'UTC') as STRING) as date_str,
   cast(from_utc_timestamp(timestamp(t) ,'UTC') as DATE) as date_alt
   from df_new

回答3:

Try using as.POSIXct() ?

format(as.POSIXct(unix_t, origin = unix_t, tz = "UTC", usetz=TRUE),"%Y-%m-%d %H:mm:ss")

This will first convert unix timestamp to UTC and then formatted to desired string.

how to convert a timestamp into string (without ch

问题:

回答1:

回答2:

回答3:

收藏的人(0)

how to convert a timestamp into string (without ch

问题:

回答1:

回答2:

回答3:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮