Count number of unique values per row [duplicate]

2020-04-03 06:13发布

问题:

I want to count the number of unique values per row.

For instance with this data frame:

example <- data.frame(var1 = c(2,3,3,2,4,5), 
                  var2 = c(2,3,5,4,2,5), 
                  var3 = c(3,3,4,3,4,5))

I want to add a column which counts the number of unique values per row; e.g. 2 for the first row (as there are 2's and 3's in the first row) and 1 for the second row (as there are only 3's in the second row).

Does anyone know an easy code to do this? Up until now I only found code for counting the number of unique values per column.

回答1:

This apply function returns a vector of the number of unique values in each row:

apply(example, 1, function(x)length(unique(x)))

You can append it to your data.frame using on of the following two ways (and if you want to name that column as count):

example <- cbind(example, count = apply(example, 1, function(x)length(unique(x))))

or

example$count <- apply(example, 1, function(x)length(unique(x)))


回答2:

We can also use a vectorized approach with regex. After pasteing the elements of each row of the dataset (do.call(paste0, ...), match a pattern of any character, capture as a group ((.)), using the positive lookahead, match characters only if it appears again later in the string (\\1 - backreference for the captured group and replace it with blank (""). So, in effect only those characters remain that will be unique. Then, with nchar we count the number of characters in the string.

example$count <- nchar(gsub("(.)(?=.*?\\1)", "", do.call(paste0, example), perl = TRUE))
example$count
#[1] 2 1 3 3 2 1