-->

Using cut2 from Hmisc to calculate cuts for differ

2019-06-26 06:33发布

问题:

I was trying to calculate equal quantile cuts for a vector by using cut2 from Hmisc.

library(Hmisc)
c <- c(-4.18304,-3.18343,-2.93237,-2.82836,-2.13478,-2.01892,-1.88773,
       -1.83124,-1.74953,-1.74858,-0.63265,-0.59626,-0.5681)

cut2(c, g=3, onlycuts=TRUE)

[1] -4.18304 -2.01892 -1.74858 -0.56810

But I was expecting the following result (33%, 33%, 33%):

[1] -4.18304 -2.13478 -1.74858 -0.56810

Should I still use cut2 or try something different? How can I make it work? Thanks for your advice.

回答1:

You are seeing the cutpoints, but you want the tabular counts, and you want them as fractions of the total, so do this instead:

> prop.table(table(cut2(c, g=3) ) )

[-4.18,-2.019) [-2.02,-1.749) [-1.75,-0.568] 
     0.3846154      0.3076923      0.3076923 

(Obviously you cannot expect cut2 to create an exact split when the count of elements was not evenly divisible by 3.)



回答2:

It seems that there were accidentally thirteen values in the original data set, instead of twelve. Thirteen values cannot be equally divided into three quantile groups (as mentioned by BondedDust). Here is the original problem, except that one selected data value (-1.74953) is excluded, making it twelve values. This gives the result originally expected:

library(Hmisc)

c<-c(-4.18304,-3.18343,-2.93237,-2.82836,-2.13478,-2.01892,-1.88773,-1.83124,-1.74858,-0.63265,-0.59626,-0.5681)

cut2(c, g=3,onlycuts=TRUE)
#[1] -4.18304 -2.13478 -1.74953 -0.5681


To make it clearer to anyone not familiar with cut2 from the Hmisc package (like me as of this morning), here's a similar problem, except that we'll use the integers 1 through 12 (assigned to the vector dozen_values).

library(Hmisc)

dozen_values <-1:12

quantile_groups <- cut2(dozen_values,g=3)

levels(quantile_groups)
## [1] "[1, 5)" "[5, 9)" "[9,12]"

cutpoints <- cut2(dozen_values, g=3, onlycuts=TRUE)

cutpoints
## [1]  1  5  9 12

# Show which values belong to which quantile group, using a data frame
quantile_DF <- data.frame(dozen_values, quantile_groups)
names(quantile_DF) <- c("value", "quantile_group")

quantile_DF
##    value quantile_group
## 1      1         [1, 5)
## 2      2         [1, 5)
## 3      3         [1, 5)
## 4      4         [1, 5)
## 5      5         [5, 9)
## 6      6         [5, 9)
## 7      7         [5, 9)
## 8      8         [5, 9)
## 9      9         [9,12]
## 10    10         [9,12]
## 11    11         [9,12]
## 12    12         [9,12]

Notice that, the first quantile group includes everything up to, but not including, 5 (i.e. 1 thorough 4, in this case). The second quantile group contains 5 up to, but not including, 9 (i.e. 5 through 8, in this case). The third (last) quantile group contains 9 through 12, which includes the last value 12. Unlike the other quantile groups, the third quantile group includes the last value shown.

Anyway, you can see that the "cutpoints" 1, 5, 9, and 12 describe the start and end points of the quantile groups in the most concise way, but it is obtuse without reading relevant documentation (link to single page Inside-R site, instead of the almost 400 page PDF manual).

See this explanation about the parentheses vs square bracket notation, if it is unfamiliar to you.



标签: r quantile hmisc