Suppose I have a data frame (df) that looks like below:
options(stringsAsFactors = F)
cars <- c("Car1", "Car2", "Car3", "Car4", "Car5", "Car6", "Car7", "Car8", "Car9")
test1 <- c(0,0,3,1,4,2,1,3,0)
test2 <- c(0,0,2,1,0,2,2,5,0)
test3 <- c(1,0,5,1,2,2,6,7,0)
test4 <- c(2,NA,2,1,2,2,1,1,0)
test5 <- c(0,0,1,1,0,2,1,3,0)
test6 <- c(1,0,1,1,1,2,3,4,0)
test7 <- c(3,0,2,1,0,2,1,1,0)
df <- data.frame(cars,test1,test2,test3,test4,test5,test6,test7)
#df
cars test1 test2 test3 test4 test5 test6 test7
#1 Car1 0 0 1 2 0 1 3
#2 Car2 0 0 0 NA 0 0 0
#3 Car3 3 2 5 2 1 1 2
#4 Car4 1 1 1 1 1 1 1
#5 Car5 4 0 2 2 0 1 0
#6 Car6 2 2 2 2 2 2 2
#7 Car7 1 2 6 1 1 3 1
#8 Car8 3 5 7 1 3 4 1
#9 Car9 0 0 0 0 0 0 0
I want to remove any rows that have the same value throughout the entire row (in the example above, I would like to keep rows 1, 3, 5, 7, 8 and remove the rest).
I've figured out how to remove all rows that have zeros
df$sum <- rowSums(df[,c(2:8)], na.rm = T )
df.all0 <- df[which(df$sum == 0),]
However, this doesn't necessarily work for all the other rows. Unlike other questions, this question asks to look for duplicates across the entire row, not just specific columns.
Any help would be greatly appreciated!
Here is an option with
rowSums
; the logic is to check if there is any value in the row that is different (NA doesn't count) from one of the columns that you are interested in:We can also use
Map
withReduce
Or using
tidyverse