I have a very large dataframe with rows as observations and columns as genetic markers. I would like to create a new column that contains the sum of a select number of columns for each observation using R.
If I have 200 columns and 100 rows, I would like a to create a new column that has 100 rows with the sum of say columns 43 through 167. The columns have either 1 or 0. With the new column that contains the sum of each row, I will be able to sort the individuals who have the most genetic markers.
I feel it is something close to:
data$new=sum(data$[,43:167])
you can use rowSums
rowSums(data)
should give you what you want.
The rowSums function (as Greg mentions) will do what you want, but you are mixing subsetting techniques in your answer, do not use \"$\" when using \"[]\", your code should look something more like:
data$new <- rowSums( data[,43:167] )
If you want to use a function other than sum, then look at ?apply for applying general functions accross rows or columns.
I came here hoping to find a way to get the sum across all columns in a data table and run into issues implementing the above solutions. A way to add a column with the sum across all columns uses the cbind
function:
cbind(data, total = rowSums(data))
This method adds a total
column to the data and avoids the alignment issue yielded when trying to sum across ALL columns using the above solutions (see the post below for a discussion of this issue).
Adding a new column to matrix error