What is the most elegant way to split data and pro

2020-03-19 07:08发布

问题:

I want to produce seasonal boxplots for a lot of different time series. I hope that the code below clearly illustrates what I want to do.

My question is now, how to do this in the most elegant way with as few lines of code as possible. I can create an new object for each month with the function "subset" and then plot it, but this seems to be not very elegant. I tried to use the "split" function, but I don't know, how to proceed from there.

Please tell me if my question is not clearly stated or edit it to make it clearer.

Any direct help or linkage to other websites/posts is greatly appreciated. Thanks for your time.

Here is the code:

## Create Data
Time <- seq(as.Date("2003/8/6"), as.Date("2011/8/5"), by = "2 weeks")
data <- rnorm(209, mean = 15, sd = 1)
DF <- data.frame(Time = Time, Data = data)
DF[,3] <- as.numeric(format(DF$Time, "%m"))
colnames(DF)[3] <- "Month"

## Create subsets
Jan <- subset(DF, Month == 1)
Feb <- subset(DF, Month == 2)
Mar <- subset(DF, Month == 3)
Apr <- subset(DF, Month == 4)

## Create boxplot
months <- c("Jan", "Feb", "Mar", "Apr")
boxplot(Jan$Data, Feb$Data, Mar$Data, Apr$Data, ylab = "Data", xlab = "Months", names = months)

## Try with "split" function
DF.split <- split(DF, DF$Month)
head(DF.split)

回答1:

You are better off picking out the month names directly with the "%b" format and using an ordered factor and the formula interface for boxplot:

DF$month <- factor(strftime(DF$Time,"%b"),levels=month.abb)
boxplot(Data~month,DF)



回答2:

Using 'ggplot2' (and @James' month names, thanks!):

DF$month <- factor(strftime(DF$Time,"%b"),levels=month.abb)
ggplot(DF, aes(x=,month, y=Data)) +
    geom_boxplot()

(BTW: note that in 'ggplot2' "The upper and lower "hinges" correspond to the first and third quartiles (the 25th and 7th percentiles). This differs slightly from the method used by the boxplot function, and may be apparent with small samples." - see documentation)



回答3:

To set months as ordered factor in any locale settings use a trick which can be found in help page for ?month.abb:

Sys.setlocale("LC_TIME", "German_Germany")
DF$month <- factor(format(DF$Time, "%b"), levels=format(ISOdate(2000, 1:12, 1), "%b"))

And you could plot it in lattice as well:

require(lattice)
bwplot(Data~month, DF, pch="|") # set pch to nice line instead of point