data.table objects assigned with := from within fu

2018-12-31 21:11发布

问题:

I would like to modify a data.table within a function. If I use the := feature within the function, the result is only printed for the second call.

Look at the following illustration:

library(data.table)
mydt <- data.table(x = 1:3, y = 5:7)

myfunction <- function(dt) {
    dt[, z := y - x]
    dt
}

When I call only the function, the table is not printed (which is the standard behaviour. However, if I save the returned data.table into a new object, it is not printed at the first call, only for the second one.

myfunction(mydt)  # nothing is printed   
result <- myfunction(mydt) 
result  # nothing is printed
result  # for the second time, the result is printed
mydt                                                                     
#    x y z
# 1: 1 5 4
# 2: 2 6 4
# 3: 3 7 4 

Could you explain why this happens and how to prevent it?

回答1:

As David Arenburg mentions in a comment, the answer can be found here. There was a bug fixed in the version 1.9.6 but the fix introduced this downside.

One should call DT[] at the end of the function to prevent this behaviour.

myfunction <- function(dt) {
    dt[, z := y - x][]
}
myfunction(mydt)  # prints immediately
#    x y z
# 1: 1 5 4
# 2: 2 6 4
# 3: 3 7 4 


回答2:

I\'m sorry if I\'m not supposed to post something here that\'s not an answer, but my post is too long for a comment.

I\'d like to point out that janosdivenyi\'s solution of adding a trailing [] to dt does not always give the expected results (even when using data.table 1.9.6 or 1.10.4) as I do below.

The examples below show that if dt is the last line in the function one gets the desired behaviour without the presence of the trailing [], but if dt is not on the last line in the function then a trailing [] is needed to get the desired behaviour.

The first example shows that with no trailing [] on dt we get the expected behaviour when dt is on the last line of the function

mydt <- data.table(x = 1:3, y = 5:7)

myfunction <- function(dt) {
  df <- 1
  dt[, z := y - x]
}

myfunction(mydt)  # Nothing printed as expected

mydt  # Content printed as desired
##    x y z
## 1: 1 5 4
## 2: 2 6 4
## 3: 3 7 4

Adding a trailing [] on dt gives unexpected behaviour

mydt <- data.table(x = 1:3, y = 5:7)

myfunction <- function(dt) {
  df <- 1
  dt[, z := y - x][]
}

myfunction(mydt)  # Content printed unexpectedly
##    x y z
## 1: 1 5 4
## 2: 2 6 4
## 3: 3 7 4

mydt  # Content printed as desired
##    x y z
## 1: 1 5 4
## 2: 2 6 4
## 3: 3 7 4

Moving df <- 1 to after the dt with no trailing [] gives unexpected behaviour

mydt <- data.table(x = 1:3, y = 5:7)

myfunction <- function(dt) {
  dt[, z := y - x]
  df <- 1
}

myfunction(mydt)  # Nothing printed as expected

mydt  # Nothing printed unexpectedly

Moving df <- 1 after the dt with a trailing [] gives the expected behaviour

mydt <- data.table(x = 1:3, y = 5:7)

myfunction <- function(dt) {
  dt[, z := y - x][]
  df <- 1
}

myfunction(mydt)  # Nothing printed as expected

mydt  # Content printed as desired
##    x y z
## 1: 1 5 4
## 2: 2 6 4
## 3: 3 7 4