Can .SD be viewed from a browser within [.data.tab

2019-02-08 11:45发布

问题:

While constructing expressions to put in the j-slot of a [.data.table call, it would often be helpful to be able to examine and play around with the contents of .SD.

This naive attempt doesn't work...

library(data.table)
DT = data.table(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9)

DT[, browser(), by=x]
# Called from: `[.data.table`(DT, , browser(), by = x)
Browse[1]> 
Browse[1]> .SD
# NULL data.table

... even though a variable named .SD and several others related to the current data.table subset are all present in the local environment

Browse[1]> ls(all.names = TRUE)
#  [1] ".BY"       ".GRP"      ".I"        ".iSD"      ".N"        ".SD"      
#  [7] "Cfastmean" "mean"      "print"     "x"        
Browse[1]> .N
# [1] 3
Browse[1]> .I
# [1] 4 5 6

Using .I, I can view something +/- like .SD, but it would be nice to be able to directly access its value:

Browse[1]> DT[.I]
#    x y v
# 1: b 1 4
# 2: b 3 5
# 3: b 6 6

My questions: Why is the expected value of .SD not directly available from within a browser() call (while .I, .N, .GRP and .BY are)? Is there some alternative way to access the value of .SD?

回答1:

Updated in light of Matthew Dowle's comments:

It turns out that .SD is, internally, the environment within which all j expressions are evaluated, including those which don't explicitly reference .SD at all. Filling it with all of DT's columns for each subset of DT is not cheap, timewise, so [.data.table() won't do so unless it really needs to.

Instead, making great use of R's lazy-evaluation of arguments, it previews the unevaluated j expression, and only adds to .SD columns that are referenced therein. If .SD itself is mentioned, it adds all of DT's columns.

So, to view .SD, just include some reference to it in the j-expression. Here is one of many expressions that will work:

library(data.table)
DT = data.table(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9)

## This works
DT[, if(nrow(.SD)) browser(), by=x]
# Called from: `[.data.table`(DT, , if (nrow(.SD)) browser(), by = x)
Browse[1]> .SD
#    y v
# 1: 1 1
# 2: 3 2
# 3: 6 3

And here are a couple more:

DT[,{.SD; browser()}, by=x]
DT[,{browser(); .SD}, by=x]  ## Notice that order doesn't matter

To see for yourself that .SD just loads columns needed by the j-expression, run these each in turn (typing .SD when entering the browser environment, and Q to leave it and return to the normal command-line):

DT[, {.N * y ; browser()}, by=x]
DT[, {v^2 ; browser()}, by=x]
DT[, {y*v ; browser()}, by=x]