Histogram of events grouped by month and day

2019-07-21 07:18发布

I am trying to make a histogram (or other plot) of the number of occurrences of each event from a set of data from multiple years but grouped by month and day. Basically I want a year long x-axis starting from 1 March showing how many times each date occurs and shading those based on a categorical value. Below is the top 20 entries in the data set:

goose

Index   DateLost    DateLost1   Nested
1   2/5/1988    1988-02-05  N
2   5/20/1988   1988-05-20  N
3   1/31/1985   1985-01-31  N
4   9/6/1997    1997-09-06  Y
5   9/24/1996   1996-09-24  N
6   9/27/1996   1996-09-27  N
7   9/15/1997   1997-09-15  Y
8   1/18/1989   1989-01-18  Y
9   1/12/1985   1985-01-12  Y
10  2/12/1988   1988-02-12  N
11  1/12/1985   1985-01-12  Y
12  10/26/1986  1986-10-26  N
13  9/15/1988   1988-09-15  Y
14  12/30/1986  1986-12-30  N
15  1/19/1991   1991-01-19  N
16  1/7/1992    1992-01-07  N
17  10/9/1999   1999-10-09  N
18  10/20/1990  1990-10-20  N
19  10/25/2001  2001-10-25  N
20  9/23/1996   1996-09-23  Y

I have tried grouping using strftime, zoo, and lubridate but then the plots don't recognize the time sequence or allow me to adjust the starting value. I have tried numerous methods using plot() and ggplot2() but either can't get the grouped data to plot correctly or can't get data grouped. My best plot so far is from this code:

ggplot(goose, aes(x=DateLost1,fill=Nested))+ stat_bin(binwidth=100 ,position="identity") + scale_x_date("Date")

This gets me a nice plot but over all years, rather than one year. I have also played with the code from a previous answer here: Understanding dates and plotting a histogram with ggplot2 in R But am having trouble choosing a start date. Any help would be greatly appreciated. Let me know if I can provide the example data in an easier to use format.

1条回答
我命由我不由天
2楼-- · 2019-07-21 08:08

Let's read in your data:

goose <- read.table(header = TRUE, text = "Index   DateLost    DateLost1   Nested
1   2/5/1988    1988-02-05  N
2   5/20/1988   1988-05-20  N
3   1/31/1985   1985-01-31  N
4   9/6/1997    1997-09-06  Y
5   9/24/1996   1996-09-24  N
6   9/27/1996   1996-09-27  N
7   9/15/1997   1997-09-15  Y
8   1/18/1989   1989-01-18  Y
9   1/12/1985   1985-01-12  Y
10  2/12/1988   1988-02-12  N
11  1/12/1985   1985-01-12  Y
12  10/26/1986  1986-10-26  N
13  9/15/1988   1988-09-15  Y
14  12/30/1986  1986-12-30  N
15  1/19/1991   1991-01-19  N
16  1/7/1992    1992-01-07  N
17  10/9/1999   1999-10-09  N
18  10/20/1990  1990-10-20  N
19  10/25/2001  2001-10-25  N
20  9/23/1996   1996-09-23  Y")

now we can convert this to POSIXct format:

goose$DateLost1 <- as.POSIXct(goose$DateLost,
                              format = "%m/%d/%Y", 
                              tz = "GMT")

then we need to figure out what year it was lost in, relative to March 31. Don't try to do this in ggplot(). This requires some mucking about to figure out which year we are in, and then calculate the number of days after March 31.

goose$DOTYMarch1 = as.numeric(format(as.POSIXct(paste0("3/1/",format(goose$DateLost1,"%Y")),
                                                format = "%m/%d/%Y",
                                                tz = "GMT"),
                              "%j"))
goose$DOTYLost = as.numeric(format(goose$DateLost1,
                             "%j"))
goose$YLost = as.numeric(format(goose$DateLost1,"%Y")) + (as.numeric(goose$DOTYLost>goose$DOTYMarch1) -1)
goose$DOTYAfterMarch31Lost = as.numeric(goose$DateLost1 - as.POSIXct(paste0("3/1/",goose$YLost),
                                                          format = "%m/%d/%Y", 
                                                          tz = "GMT"))

Then we can plot it. Your code was pretty much perfect already.

require(ggplot2)

p <- ggplot(goose, 
            aes(x=DOTYAfterMarch31Lost,
                fill=Nested))+ 
  stat_bin(binwidth=1,
           position="identity")
print(p)

And we get this:

enter image description here

查看更多
登录 后发表回答