Let's say I have a data frame with 10 numeric variables V1-V10 (columns) and multiple rows (cases).
What I would like R to do is: For each case, give me the number of occurrences of a certain value in a set of variables.
For example the number of occurrences of the numeric value 99 in that single row for V2, V3, V6, which obviously has a minimum of 0 (none of the three have the value 99) and a maximum of 3 (all of the three have the value 99).
I am really looking for an equivalent to the SPSS function COUNT
: "COUNT creates a numeric variable that, for each case, counts the occurrences of the same value (or list of values) across a list of variables."
I thought about table()
and library plyr's count()
, but I cannot really figure it out. Vectorized computation preferred. Thanks a lot!
Here is another straightforward solution that comes closest to what the COUNT command in SPSS does — creating a new variable that, for each case (i.e., row) counts the occurrences of a given value or list of values across a list of variables.
The updated data frame contains the new variable count.1 exactly as the SPSS COUNT command would do.
You can do the same to count how many time the value "2" occurs per row in V1-V4. Note that you need to select the columns (variables) in df to which the function is applied.
You can also apply a similar logic to count the number of missing values in V1-V4.
The final result should be exactly what you wanted:
This solution can easily be generalized to a range of values. Suppose we want to count how many times a value of 1 or 2 occurs in V1-V4 per row:
Try
Where
df
is yourdata.frame
. This will return a list of the same length of the amount of rows in your data.frame. Each item of the list corresponds to a row of the data.frame (in the same order), and it is a table where the content is the number of occurrences and the names are the corresponding values.For instance:
I think that there ought to be a simpler way to do this, but the best way that I can think of to get a table of counts is to loop (implicitly using sapply) over the unique values in the dataframe.
If you need to count any particular word/letter in the row.
For counting number of L in each row just use
The result will appear like this