Remove rows with the same value across all columns

2020-03-21 09:56发布

Suppose I have a data frame (df) that looks like below:

options(stringsAsFactors = F)

cars <- c("Car1", "Car2", "Car3", "Car4", "Car5", "Car6", "Car7", "Car8", "Car9")
test1 <- c(0,0,3,1,4,2,1,3,0)
test2 <- c(0,0,2,1,0,2,2,5,0)
test3 <- c(1,0,5,1,2,2,6,7,0)
test4 <- c(2,NA,2,1,2,2,1,1,0)
test5 <- c(0,0,1,1,0,2,1,3,0)
test6 <- c(1,0,1,1,1,2,3,4,0)
test7 <- c(3,0,2,1,0,2,1,1,0)

df <- data.frame(cars,test1,test2,test3,test4,test5,test6,test7)

#df
   cars test1 test2 test3 test4 test5 test6 test7
#1 Car1     0     0     1     2     0     1     3
#2 Car2     0     0     0    NA     0     0     0
#3 Car3     3     2     5     2     1     1     2
#4 Car4     1     1     1     1     1     1     1
#5 Car5     4     0     2     2     0     1     0
#6 Car6     2     2     2     2     2     2     2
#7 Car7     1     2     6     1     1     3     1
#8 Car8     3     5     7     1     3     4     1
#9 Car9     0     0     0     0     0     0     0

I want to remove any rows that have the same value throughout the entire row (in the example above, I would like to keep rows 1, 3, 5, 7, 8 and remove the rest).

I've figured out how to remove all rows that have zeros

 df$sum <- rowSums(df[,c(2:8)], na.rm = T )
 df.all0 <- df[which(df$sum == 0),]

However, this doesn't necessarily work for all the other rows. Unlike other questions, this question asks to look for duplicates across the entire row, not just specific columns.

Any help would be greatly appreciated!

标签: r dataframe
3条回答
够拽才男人
2楼-- · 2020-03-21 10:35
keep <- apply(df[2:8], 1, function(x) length(unique(x[!is.na(x)])) != 1)
df[keep, ]

  cars test1 test2 test3 test4 test5 test6 test7
1 Car1     0     0     1     2     0     1     3
3 Car3     3     2     5     2     1     1     2
5 Car5     4     0     2     2     0     1     0
7 Car7     1     2     6     1     1     3     1
8 Car8     3     5     7     1     3     4     1
查看更多
唯我独甜
3楼-- · 2020-03-21 10:39

Here is an option with rowSums; the logic is to check if there is any value in the row that is different (NA doesn't count) from one of the columns that you are interested in:

df[rowSums(df[-1] != df[[2]], na.rm = TRUE) != 0,]

#  cars test1 test2 test3 test4 test5 test6 test7
#1 Car1     0     0     1     2     0     1     3
#3 Car3     3     2     5     2     1     1     2
#5 Car5     4     0     2     2     0     1     0
#7 Car7     1     2     6     1     1     3     1
#8 Car8     3     5     7     1     3     4     1
查看更多
闹够了就滚
4楼-- · 2020-03-21 11:00

We can also use Map with Reduce

df[c(Reduce(`+`, Map(function(x,y) x != y & !is.na(x), df[-1], list(df[2]))) != 0),]
#  cars test1 test2 test3 test4 test5 test6 test7
#1 Car1     0     0     1     2     0     1     3
#3 Car3     3     2     5     2     1     1     2
#5 Car5     4     0     2     2     0     1     0
#7 Car7     1     2     6     1     1     3     1
#8 Car8     3     5     7     1     3     4     1

Or using tidyverse

library(tidyverse)
df %>% 
    filter_at(vars(starts_with("test")), any_vars((. != test1)))
#   cars test1 test2 test3 test4 test5 test6 test7
#1 Car1     0     0     1     2     0     1     3
#2 Car3     3     2     5     2     1     1     2
#3 Car5     4     0     2     2     0     1     0
#4 Car7     1     2     6     1     1     3     1
#5 Car8     3     5     7     1     3     4     1
查看更多
登录 后发表回答