Sort Datetime data by day, but from 4PM to 4PM

2019-08-04 23:38发布

问题:

I have Tweets from various times a day about companies, and I want to group them all by day. I have already done this. However, I want to sort them not from 00:00 until 23:59, but instead from 16:00 until 15:59 (because of the NYSE open hours).

Tweets (Negative, Neutral and Positive is for the sentiment):

 Company,Datetime_UTC,Negative,Neutral,Positive,Volume
 AXP,2013-06-01 16:00:00+00:00,0,2,0,2
 AXP,2013-06-01 17:00:00+00:00,0,2,0,2
 AXP,2013-06-02 05:00:00+00:00,0,1,0,1
 AXP,2013-06-02 16:00:00+00:00,0,2,0,2

My code:

 Tweets$Datetime_UTC <- as.Date(Tweets$Datetime)
 Sent <- aggregate(list(Tweets$Negative, Tweets$Neutral, Tweets$Positive), by=list(Tweets$Company, Tweets$Datetime_UTC), sum)
 colnames(Sent) <- c("Company", "Date", "Negative", "Neutral", "Positive")
 Sent <- Sent[order(Sent$Company),]

Output of that code:

 Company,Date,Negative,Neutral,Positive
 AXP,2013-06-01,0,4,0
 AXP,2013-06-02,0,3,0

How I'd want it to be (considering that a day should start at 16:00):

 Company,Date,Negative,Neutral,Positive
 AXP,2013-06-02,0,5,0
 AXP,2013-06-03,0,2,0  

As you can see, my code almost works. I just want to sort after different time windows.

How to do this? One idea would be to just add +8h to every single Datetime_UTC, which would change 16:00 into 00:00. After this, I could just use my code. Would that be possible?

Thanks in advance!! :-)

回答1:

Effectively what you're doing is redefining a date to start at 16:00 instead of 00:00. One option would be to convert to epoch time (seconds since 1970:01:01 00:00:00+00:00 and simply slide your data forward by eight hours.

You can convert to epoch seconds, then add 8 hours worth of seconds, and then convert back to Date class all in one line. Then you would just aggregate as you had been.

Tweets$Datetime_UTC <- as.Date(as.integer(as.POSIXct(Tweets)) + 28800)

Replace your first line of code with that and it should do the trick.