Conditional mean statement

2019-05-01 17:44发布

问题:

I have a dataset named bwght which contains the variable cigs (cigarattes smoked per day)

When I calculate the mean of cigs in the dataset bwght using: mean(bwght$cigs), I get a number 2.08.

Only 212 of the 1388 women in the sample smoke (and 1176 does not smoke):

summary(bwght$cigs>0) gives the result:

Mode      FALSE    TRUE    NA's 
logical    1176     212       0

I'm asked to find the average of cigs among the women who smoke (the 212).

I'm having a hard time finding the right syntax for excluding the non smokers = 0 I have tried:

  • mean(bwght$cigs| bwght$cigs>0)

  • mean(bwght$cigs>0 | bwght$cigs=TRUE)

  • if (bwght$cigs > 0){ sum(bwght$cigs) }

  • x <-as.numeric(bwght$cigs, rm="0"); mean(x)

But nothing seems to work! Can anyone please help me??

回答1:

If you want to exclude the non-smokers, you have a few options. The easiest is probably this:

mean(bwght[bwght$cigs>0,"cigs"])

With a data frame, the first variable is the row and the next is the column. So, you can subset using dataframe[1,2] to get the first row, second column. You can also use logic in the row selection. By using bwght$cigs>0 as the first element, you are subsetting to only have the rows where cigs is not zero.

Your other ones didn't work for the following reasons:

mean(bwght$cigs| bwght$cigs>0)

This is effectively a logical comparison. You're asking for the TRUE / FALSE result of bwght$cigs OR bwght$cigs>0, and then taking the mean on it. I'm not totally sure, but I think R can't even take data typed as logical for the mean() function.

mean(bwght$cigs>0 | bwght$cigs=TRUE)

Same problem. You use the | sign, which returns a logical, and R is trying to take the mean of logicals.

if(bwght$cigs > 0){sum(bwght$cigs)}

By any chance, were you a SAS programmer originally? This looks like how I used to type at first. Basically, if() doesn't work the same way in R as it does in SAS. In that example, you are using bwght$cigs > 0 as the if condition, which won't work because R will only look at the first element of the vector resulting from bwght$cigs > 0. R handles looping differently from SAS - check out functions like lapply, tapply, and so on.

x <-as.numeric(bwght$cigs, rm="0")
mean(x)

I honestly don't know what this would do. It might work if rm="0" didn't have quotes...?



回答2:

mean(bwght[bwght$cigs>0,"cigs"])

I found the statement failed, returning "argument is not numeric or logical: returning NA"

Converting to matrix solved this:

mean(data.matrix(bwght[bwght$cigs>0,"cigs"]))


标签: r condition mean