I've got some code that generate stratified weighted means and I'm certain this worked a few months ago. But, but I'm not sure what the current problem is. (I apologize - this must be very basic stuff):
dp=
structure(list(seqn = c(1L, 2L, 3L, 4L, 6L, 7L, 8L, 9L, 10L,
11L, 12L, 13L, 3L, 4L, 9L, 10L, 11L, 14L, 8L, 11L, 12L, 10L,
5L, 13L, 2L, 14L, 3L, 9L, 6L, 7L), sex = c(2L, 1L, 2L, 2L, 1L,
2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), bmi = c(22.8935608711259,
27.0944623781918, 40.4637162938634, 23.7649712675423, 15.3193372705538,
31.1280302540991, 21.4866354393239, 20.3200254374398, 32.331092513536,
25.3679771839413, 33.9400508162971, 14.7048592172926, 25.5243757788688,
23.4331882363495, 27.6428134168995, 29.3923629426172, 24.9547209666314,
17.0522203606383, 15.51, 22, 30.62, 30.94, 29.1, 25.57, 24.9,
27.33, 17.63, 18.48, 22.56, 29.39), tc = c(273L, 181L, 150L,
201L, 142L, 165L, 235L, 219L, 298L, 222L, 143L, 134L, 268L, 160L,
236L, 225L, 260L, 140L, 162L, 132L, 156L, 140L, 279L, 314L, 215L,
174L, 129L, 148L, 153L, 245L), swt = c(1645, 3318, 2280, 1574,
4062, 1627, 14604, 24675, 975, 975, 2697, 1559, 1737.58, 1730.23,
19521.36, 28080.57, 1248.43, 13745.77, 5251.76464426326, 6497.194885522,
15915.7023420765, 3740.96809540218, 16574.177622509, 307.32513798849,
4720.89748295751, 3247.78896499604, 7698.70949077031, 1262.6450411464,
6609.43340735515, 4254.23723479882)), .Names = c("seqn", "sex",
"bmi", "tc", "swt"), row.names = c(20560L, 20561L, 20562L, 20563L,
20565L, 20566L, 20567L, 20568L, 20569L, 20570L, 20571L, 20572L,
61335L, 61336L, 61338L, 61339L, 61340L, 61341L, 95465L, 96890L,
104613L, 105988L, 107581L, 112267L, 113403L, 114292L, 119979L,
120271L, 125939L, 135699L), class = "data.frame")
dt=data.table(dp, key='sex')
sapply(df,function(x)weighted.mean(x,df$swt)) #this works to weighted mean
dt[,lapply(.SD, mean, na.rm=T), .SDcols=c('bmi','tc','swt')]
#this also works for overall unweighted mean
dt[,lapply(.SD, function(x)weighted.mean(x,swt, na.rm=TRUE)), by=key(dt), .SDcols=c('bmi','tc','swt')]
but this gives the error:
Error in weighted.mean.default(x, swt, na.rm = TRUE) : object 'swt' not found
sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: i386-w64-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.8.6
loaded via a namespace (and not attached):
[1] tools_2.15.2
UPDATE (from Arun): This is now fixed in v1.8.11. From NEWS:
This is indeed a bug introduced somewhere between 1.8.2 and 1.8.6.
To work around this in the meantime, either turn off optimization :
or, don't wrap with
function()
:We are making more use of optimization now, but this case slipped through the test suite: tests 825.1, 825.2 and 825.3 didn't cover an argument to a function being another column, within an anonymous
function()
. It would be a problem where the function isn't already given; i.e., unlike this case, where thefunction()
can just be omitted sinceweighted.mean
is already given and can be applied as-is.You can see how optimization modifies j by setting
verbose=TRUE
(either per query or with the global option). In this case nothing would have been revealed as wrong by that verbose output, but just mentioning it as an aside.Now filed as #2381: Optimization of lapply(.SD, function() ...) no longer sees columns inside .... Will fix and add tests so this can't regress again.
Thanks!
I suggest to keep it simple:
I think this is also reasonably fast.