可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
This is an example of how my data set (MergedData
) looks like in R, where each of my participants (5 rows) obtained a score number in every test (7 columns). I would like to know the total score of all tests combined (all columns) but for each participant (row).
Also, my complete data set has more than just these few variables, so if possible, I would like do it using a formula & loop and not having to type row by row/column by column.
Participant TestScores
ParticipantA 2 4 2 3 2 3 4
ParticipantB 1 3 2 2 3 3 3
ParticipantC 1 4 4 2 3 4 2
ParticipantD 2 4 2 3 2 4 4
ParticipantE 1 3 2 2 2 2 2
I have tried this but it doesn't work:
Test_Scores <- rowSums(MergedData[Test1, Test2, Test3], na.rm=TRUE)
I get the following error-message:
Error in `[.data.frame`(MergedData, Test1, Test2, Test3, :
unused arguments
How do I solve this? Thank you!!
回答1:
I think you want this:
rowSums(MergedData[,c('Test1', 'Test2', 'Test3')], na.rm=TRUE)
回答2:
You could use:
MergedData$Test_Scores_Sum <- rowSums(MergedData[,2:8], na.rm=TRUE)
Where 2:8
are all the columns (tests) you want to sum up. This way it will create another column in your data.
This way you dont have to type each column name and you can still have other columns in you data frame which will not be summed up. Note however, that all columns of tests you want to sum up should be beside each other (as in your example data).
回答3:
Please consult the documentation for ?rowSums
and ?colSums
.
It's not clear from your post exactly what MergedData
is. Assuming it's a data.frame
, the problem is your indexing MergedData[Test1, Test2, Test3]
. If it is a data.frame
, you'd like to run something like:
Test_Scores <- rowSums(MergedData, na.rm = TRUE)
or
Test_Scores <- rowSums(MergedData[, c("Test1", "Test2", "Test3")], na.rm = TRUE)
if you only want to use the columns named "Test1"
, "Test2"
, and "Test3"
(if they indeed are named so).
If this doesn't work. Please show us the output of str(MergedData)
.
You need to provide a minimal reproducible example of the error to get any really helpful answers.
回答4:
For small data, it might be interesting to convert the data.frame
to a table
then use addmargins()
.
With this sample data
MergedData<-data.frame(Participant=letters[1:5],
Test1 = c(2,1,1,2,1),
Test2 = c(4,3,4,4,3),
Test3 = c(2,2,4,2,2),
Test4 = c(3,2,2,3,2),
Test5 = c(2,3,3,2,2)
)
and this helper function
as.table.data.frame<-function(x, rownames=0) {
numerics <- sapply(x,is.numeric)
chars <- which(sapply(x,function(x) is.character(x) || is.factor(x)))
names <- if(!is.null(rownames)) {
if (length(rownames)==1) {
if (rownames ==0) {
rownames(x)
} else {
as.character(x[,rownames])
}
} else {
rownames
}
} else {
if(length(chars)==1) {
as.character(x[,chars])
} else {
rownames(x)
}
}
x<-as.matrix(x[,numerics])
rownames(x)<-names
structure(x, class="table")
}
you could do
addmargins(as.table(MergedData))
to get
Test1 Test2 Test3 Test4 Test5 Sum
a 2 4 2 3 2 13
b 1 3 2 2 3 11
c 1 4 4 2 3 14
d 2 4 2 3 2 13
e 1 3 2 2 2 10
Sum 7 18 12 12 12 61
Probably not super useful in this case, but a fun use of addmargins
nonetheless.
回答5:
Four previous answers and only one showing a result? What's up with that? Here's one
> dat <- read.table(header=T, text =
'Participant Test1 Test2 Test3 Test4 Test5 Test6 Test7
ParticipantA 2 4 2 3 2 3 4
ParticipantB 1 3 2 2 3 3 3
ParticipantC 1 4 4 2 3 4 2
ParticipantD 2 4 2 3 2 4 4
ParticipantE 1 3 2 2 2 2 2')
You wrote that
"...if possible, I would like do it using a formula & loop and not having to type row by > row/column by column"
You won't have to write any loops at all. The row and column functions operate on all the row and all the columns, with no looping.
> rowSums(dat[-1], na.rm = TRUE)
## [1] 20 17 20 21 14
> colSums(dat[-1], na.rm = TRUE)
## Test1 Test2 Test3 Test4 Test5 Test6 Test7
## 7 18 12 12 12 16 15
回答6:
Here's a way to do it with dplyr
and reshape2
:
dat <- read.table(header=T, text =
'Participant Test1 Test2 Test3 Test4 Test5 Test6 Test7
ParticipantA 2 4 2 3 2 3 4
ParticipantB 1 3 2 2 3 3 3
ParticipantC 1 4 4 2 3 4 2
ParticipantD 2 4 2 3 2 4 4
ParticipantE 1 3 2 2 2 2 2')
library(dplyr)
library(reshape2)
# Melt data into long format
dat.l = melt(dat, id.var="Participant", variable.name="Test")
> dat.l
Participant Test value
1 ParticipantA Test1 2
2 ParticipantB Test1 1
3 ParticipantC Test1 1
4 ParticipantD Test1 2
...
32 ParticipantB Test7 3
33 ParticipantC Test7 2
34 ParticipantD Test7 4
35 ParticipantE Test7 2
# Sum by Participant
dat.l %.%
group_by(Participant) %.%
summarise(Sum=sum(value))
Participant Sum
1 ParticipantA 20
2 ParticipantB 17
3 ParticipantC 20
4 ParticipantD 21
5 ParticipantE 14
# Sum by Test
dat.l %.%
group_by(Test) %.%
summarise(Sum=sum(value))
Test Sum
1 Test1 7
2 Test2 18
3 Test3 12
4 Test4 12
5 Test5 12
6 Test6 16
7 Test7 15