Remove rows with the same value across all columns

Suppose I have a data frame (df) that looks like below:

options(stringsAsFactors = F)

cars <- c("Car1", "Car2", "Car3", "Car4", "Car5", "Car6", "Car7", "Car8", "Car9")
test1 <- c(0,0,3,1,4,2,1,3,0)
test2 <- c(0,0,2,1,0,2,2,5,0)
test3 <- c(1,0,5,1,2,2,6,7,0)
test4 <- c(2,NA,2,1,2,2,1,1,0)
test5 <- c(0,0,1,1,0,2,1,3,0)
test6 <- c(1,0,1,1,1,2,3,4,0)
test7 <- c(3,0,2,1,0,2,1,1,0)

df <- data.frame(cars,test1,test2,test3,test4,test5,test6,test7)

#df
   cars test1 test2 test3 test4 test5 test6 test7
#1 Car1     0     0     1     2     0     1     3
#2 Car2     0     0     0    NA     0     0     0
#3 Car3     3     2     5     2     1     1     2
#4 Car4     1     1     1     1     1     1     1
#5 Car5     4     0     2     2     0     1     0
#6 Car6     2     2     2     2     2     2     2
#7 Car7     1     2     6     1     1     3     1
#8 Car8     3     5     7     1     3     4     1
#9 Car9     0     0     0     0     0     0     0

I want to remove any rows that have the same value throughout the entire row (in the example above, I would like to keep rows 1, 3, 5, 7, 8 and remove the rest).

I've figured out how to remove all rows that have zeros

 df$sum <- rowSums(df[,c(2:8)], na.rm = T )
 df.all0 <- df[which(df$sum == 0),]

However, this doesn't necessarily work for all the other rows. Unlike other questions, this question asks to look for duplicates across the entire row, not just specific columns.

Any help would be greatly appreciated!

标签： r dataframe

3条回答

够拽才男人

2楼-- · 2020-03-21 10:35

keep <- apply(df[2:8], 1, function(x) length(unique(x[!is.na(x)])) != 1)
df[keep, ]

  cars test1 test2 test3 test4 test5 test6 test7
1 Car1     0     0     1     2     0     1     3
3 Car3     3     2     5     2     1     1     2
5 Car5     4     0     2     2     0     1     0
7 Car7     1     2     6     1     1     3     1
8 Car8     3     5     7     1     3     4     1

0人赞添加讨论(0) 举报

唯我独甜

3楼-- · 2020-03-21 10:39

Here is an option with rowSums; the logic is to check if there is any value in the row that is different (NA doesn't count) from one of the columns that you are interested in:

df[rowSums(df[-1] != df[[2]], na.rm = TRUE) != 0,]

#  cars test1 test2 test3 test4 test5 test6 test7
#1 Car1     0     0     1     2     0     1     3
#3 Car3     3     2     5     2     1     1     2
#5 Car5     4     0     2     2     0     1     0
#7 Car7     1     2     6     1     1     3     1
#8 Car8     3     5     7     1     3     4     1

0人赞添加讨论(0) 举报

闹够了就滚

4楼-- · 2020-03-21 11:00

We can also use Map with Reduce

df[c(Reduce(`+`, Map(function(x,y) x != y & !is.na(x), df[-1], list(df[2]))) != 0),]
#  cars test1 test2 test3 test4 test5 test6 test7
#1 Car1     0     0     1     2     0     1     3
#3 Car3     3     2     5     2     1     1     2
#5 Car5     4     0     2     2     0     1     0
#7 Car7     1     2     6     1     1     3     1
#8 Car8     3     5     7     1     3     4     1

Or using tidyverse

library(tidyverse)
df %>% 
    filter_at(vars(starts_with("test")), any_vars((. != test1)))
#   cars test1 test2 test3 test4 test5 test6 test7
#1 Car1     0     0     1     2     0     1     3
#2 Car3     3     2     5     2     1     1     2
#3 Car5     4     0     2     2     0     1     0
#4 Car7     1     2     6     1     1     3     1
#5 Car8     3     5     7     1     3     4     1

0人赞添加讨论(0) 举报

Remove rows with the same value across all columns

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间