I'm quite new to R and this is the first time I dare to ask a question here.
I'm working with a dataset with likert scales and I want to row sum over different group of columns which share the first strings in their name.
Below I constructed a data frame of only 2 rows to illustrate the approach I followed, though I would like to receive feedback on how I can write a more efficient way of doing it.
df <- as.data.frame(rbind(rep(sample(1:5),4),rep(sample(1:5),4)))
var.names <- c("emp_1","emp_2","emp_3","emp_4","sat_1","sat_2"
,"sat_3","res_1","res_2","res_3","res_4","com_1",
"com_2","com_3","com_4","com_5","cap_1","cap_2",
"cap_3","cap_4")
names(df) <- var.names
So, what I did, was to use the grep function in order to be able to sum the rows of the specified variables that started with certain strings and store them in a new variable. But I have to write a new line of code for each variable.
df$emp_t <- rowSums(df[, grep("\\bemp.", names(df))])
df$sat_t <- rowSums(df[, grep("\\bsat.", names(df))])
df$res_t <- rowSums(df[, grep("\\bres.", names(df))])
df$com_t <- rowSums(df[, grep("\\bcom.", names(df))])
df$cap_t <- rowSums(df[, grep("\\bcap.", names(df))])
But there is a lot more variables in the dataset and I would like to know if there is a way to do this with only one line of code. For example, some way to group the variables that start with the same strings together and then apply the row function.
Thanks in advance!
Agree with MrFlick that you may want to put your data in long format (see
reshape2
,tidyr
), but to answer your question:Will do the trick
You'll be better off in the long run if you put your data into tidy format. The problem is that the data is in a wide rather than a long format. And the variable names, e.g.,
emp_1
, are actually two separate pieces of data: the class of the person, and the person's ID number (or something like that). Here is a solution to your problem with dplyr and tidyr.First we convert the data frame from wide to long format with
gather()
. Then we split the valuesemp_1
into separate columnsclass
andid
withextract()
. Finally we group by the class and sum the values in each class. Result:One possible solution is to transpose
df
and calculate sums for the correct columns using base Rrowsum
function (usingset.seed(123)
)