Inconsistent results in apply

This is basically the question asked here (not by me), but I've simplified the example and I simply can't figure out what is going on, so I decided I'd pose it again in a way that may get more responses.

Take data dd:

dd <- structure(list(first = c("118751", "55627", NA), one = c(41006L, 
119098L, 109437L), two = c(118751L, 109016L, 109831L), three = c(122631L, 
104639L, 120634L), four = c(38017L, 118950L, 105440L), five = c(114826L, 
122047L, 124347L), six = c(109438L, 55627L, 118679L), seven = c(27094L, 
107044L, 122161L), eight = c(112473L, 116909L, 124363L), nine = c(120586L, 
114711L, 120509L)), row.names = c(NA, 3L), class = "data.frame")

dd
   first    one    two  three   four   five    six  seven  eight   nine
1 118751  41006 118751 122631  38017 114826 109438  27094 112473 120586
2  55627 119098 109016 104639 118950 122047  55627 107044 116909 114711
3   <NA> 109437 109831 120634 105440 124347 118679 122161 124363 120509

Now, we want to find the rows where the number in column first equal the number in column six (which is the seventh column in the dataframe), using apply:

apply(dd,1,function(x) as.integer(x["first"])==x[7])

    1     2     3 
FALSE FALSE    NA

This result is clearly false - 2 should have produced a TRUE. Oddly, if I run the same thing ONLY on the second row, I get the correct answer:

apply(dd[2,],1,function(x) as.integer(x["first"])==x[7])

   2 
TRUE

I also tried other subsets - 1:2, 2:3, and even c(1,3). The latter gives me the expected result, while the first two keep insisting on a FALSE for row 2.

If I drop the apply, I get the correct response (regardless of subset):

as.integer(dd$first)==dd$six
[1] FALSE  TRUE    NA

What the hell is going on?

标签： r apply

2条回答

闹够了就滚

2楼-- · 2019-08-30 02:49

The issue is your data types. Your first column is character, the rest of your columns are integer. You attempt to correct for this with as.integer() inside the apply, but it is too late. apply works on matrices, not data frames. When you give it a data frame, it is immediately converted to a matrix. Matrices can't have different column classes, and (generally) character can't be converted to numeric, so all your data is converted to character.

Here's a window into that conversion:

apply(dd, 1, print)
#       1        2        3       
# first "118751" "55627"  NA      
# one   " 41006" "119098" "109437"
# two   "118751" "109016" "109831"
# three "122631" "104639" "120634"
# four  " 38017" "118950" "105440"
# five  "114826" "122047" "124347"
# six   "109438" " 55627" "118679"
# seven " 27094" "107044" "122161"
# eight "112473" "116909" "124363"
# nine  "120586" "114711" "120509"

You can see that spaces are added as well, unfortunately, which makes the equality not true.

Instead, convert your column to it's proper type first. Or, better yet, don't bother with apply at all:

# convert
dd[, "first"] = as.integer(dd[, "first"])

# apply now works
apply(dd, 1, function(x) x["first"] == x[7])
#     1     2     3 
# FALSE  TRUE    NA 

# but isn't this easier?
dd[, "first"] == dd[, "six"]
# [1] FALSE  TRUE    NA

0人赞添加讨论(0) 举报

姐就是有狂的资本

3楼-- · 2019-08-30 02:57

Wrapping x[7] in as.integer() fixes your problem

apply(dd,1,function(x) as.integer(x["first"])==as.integer(x[7]))

because if you run the following code, you can see the as.integer(x["first"]) and x[7] are returning different class types that are not comparable.

apply(dd,1,function(x) return(list(class(as.integer(x["first"])), class(x[7]))))

$`1`
$`1`[[1]]
[1] "integer"

$`1`[[2]]
[1] "character"


$`2`
$`2`[[1]]
[1] "integer"

$`2`[[2]]
[1] "character"


$`3`
$`3`[[1]]
[1] "integer"

$`3`[[2]]
[1] "character"

0人赞添加讨论(0) 举报

Inconsistent results in apply

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间