Boxplot of pre-aggregated/grouped data in R

2019-06-20 08:13发布

In R I want to create a boxplot over count data instead of raw data. So my table schema looks like

Value | Count
1 | 2
2 | 1

...

Instead of

Value
1
1
2
...

Where in the second case I could simply do boxplot(x)

标签: r boxplot
4条回答
\"骚年 ilove
2楼-- · 2019-06-20 08:40

I'm sure there's a way to do what you want with the already summarized data, but if not, you can abuse the fact that rep takes vectors:

> dat <- data.frame(Value = 1:5, Count = sample.int(5))
> dat
  Value Count
1     1     1
2     2     3
3     3     4
4     4     2
5     5     5
> rep(dat$Value, dat$Count)
 [1] 1 2 2 2 3 3 3 3 4 4 5 5 5 5 5

Simply wrap boxplot around that and you should get what you want. I'm sure there's a more efficient / better way to do that, but this should work for you.

查看更多
乱世女痞
3楼-- · 2019-06-20 08:46

Toy data:

(besides Value and Count, I add a categorical variable Group)

set.seed(12345)
df <- data.frame(Value = sample(1:100, 100, replace = T),
                 Count = sample(1:10, 100, replace = T),
                 Group = sample(c("A", "B", "C"), 100, replace = T),
                 stringsAsFactors = F)

Use purrr::pmap and purrr::reduce to manipulate the data frame:

library(purrr)
data <- pmap(df, function(Value, Count, Group){
  data.frame(x = rep(Value, Count),
             y = rep(Group, Count))
}) %>% reduce(rbind)

boxplot(x ~ y, data = data)

enter image description here

查看更多
贪生不怕死
4楼-- · 2019-06-20 08:47

A combination of rep and data.frame can be used as an approach if another variable is needed for classification

Eg.

with(data.frame(v1=rep(data$v1,data$count),v2=(data$v2,data$count)),
    boxplot(v1 ~ v2)
)

查看更多
ら.Afraid
5楼-- · 2019-06-20 09:00

I solved a similar issue recently by using the 'apply' function on each column of counts with the 'rep' function:

> datablock <- apply(countblock[-1], 2, function(x){rep(countblock$value, x)})
> boxplot(datablock)

...The above assumes that your values are in the first column and subsequent columns contain count data.

查看更多
登录 后发表回答