Reason for unexpected output in subsetting data fr

2020-04-14 17:15发布

问题:

I have the data frame "a" and it has a variable called "VAL". I want to count the elements where the value of VAL is 23 or 24.

I used two codes which worked Ok:

nrow(subset(a,VAL==23|VAL==24) 
nrow(subset(a,VAL %in% c(23,24)))

But, I tried other code which gives an unexpected output and I don't know why.

nrow(subset(a,VAL ==c(23,24)))

Even if I change the order of 23 and 24, it gives a different unexpected output.

nrow(subset(a,VAL ==c(24,23)))

Why are those codes incorrect ? What are they actually doing?

回答1:

Working through an example shows where it is going wrong:

a <- data.frame(VAL=c(1,1,1,23,24))
a
#  VAL
#1   1
#2   1
#3   1
#4  23
#5  24

These work:

a$VAL %in% c(23,24)
#[1] FALSE FALSE FALSE  TRUE  TRUE
a$VAL==23 | a$VAL==24
#[1] FALSE FALSE FALSE  TRUE  TRUE

The following doesn't work due to vector recycling when comparing - take note of the warning message below E.g.:

a$VAL ==c(23,24)
#[1] FALSE FALSE FALSE FALSE FALSE
#Warning message:
#In a$VAL == c(23, 24) :
#  longer object length is not a multiple of shorter object length

This last bit of code recycles what you are testing against and is basically comparing:

c( 1,  1,  1, 23, 24) #to
c(23, 24, 23, 24, 23)

...so you don't get any rows returned. Changing the order will give you

c( 1,  1,  1, 23, 24) #to
c(24, 23, 24, 23, 24)

...and you will get two rows returned (which gives the intended result by pure luck, but it is not appropriate to use).

Reason for unexpected output in subsetting data fr

问题:

回答1:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮