可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

When joining data.frames along a key, and one key has a missing value (NA), my intuition was that rows with an NA key should have no match in the second data.frame. To my surprise, if there are NAs in both data.frames, dplyr matches them as if they were values.

This is extra confusing because this was discussed at length on the issues in the dplyr repository see here and it seems to be solved! If so, I'm not seeing how this is the correct solution ; or perhaps I'm missing something

I'm using dplyr 0.7.4

t1 <- data.frame(a = as.character(c("1", "2", NA, NA, "4", "2")), b = c(1, 2, 3, 3, 4, 5), stringsAsFactors = FALSE)
t2 <- data.frame(a = as.character(c("1", "2", NA)), c = c("b", "n", "i"), stringsAsFactors = FALSE)
library(dplyr)
t1
#>      a b
#> 1    1 1
#> 2    2 2
#> 3 <NA> 3
#> 4 <NA> 3
#> 5    4 4
#> 6    2 5
t2
#>      a c
#> 1    1 b
#> 2    2 n
#> 3 <NA> i
left_join(t1, t2, by = "a")
#>      a b    c
#> 1    1 1    b
#> 2    2 2    n
#> 3 <NA> 3    i
#> 4 <NA> 3    i
#> 5    4 4 <NA>
#> 6    2 5    n

When in fact I would have expected the following:

#>      a b    c
#> 1    1 1    b
#> 2    2 2    n
#> 3 <NA> 3 <NA>
#> 4 <NA> 3 <NA>
#> 5    4 4 <NA>
#> 6    2 5    n

回答1:

The solution is to use the argument na_matches = "never". This was pointed out by Dani Rabaiotti and Hadley Wickham on twitter.

This argument is documented in the left_join method for the tbl_df class: ?left_join.tbl_df

回答2:

This behaviour is the same as merge (although with some reordering).

merge(t1,t2,all.x=T)
     a b    c
1    1 1    b
2    2 2    n
3    2 5    n
4    4 4 <NA>
5 <NA> 3    i
6 <NA> 3    i

You can get your expected output by setting incomparables=NA:

merge(t1,t2,all.x=T,incomparables=NA)
     a b    c
1    1 1    b
2    2 2    n
3    2 5    n
4    4 4 <NA>
5 <NA> 3 <NA>
6 <NA> 3 <NA>

In dplyr this option doesn't appear to be documented, but looking at dplyr:::left_join.tbl_df you can see na_matches looks promising. Some playing around reveals you need to give it the value "never".

left_join(t1,t2,by="a",na_matches="never")
     a b    c
1    1 1    b
2    2 2    n
3 <NA> 3 <NA>
4 <NA> 3 <NA>
5    4 4 <NA>
6    2 5    n

dplyr left_join matching NA

问题:

回答1:

回答2:

收藏的人(0)

dplyr left_join matching NA

问题:

回答1:

回答2:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮