In data.table
is possible to have columns of type list
and I'm trying for the first time to benefit from this feature. I need to store for each row of my table dt
several comments taken from an rApache web service. Each comment will have a username, datetime, and body item.
Instead of using long strings with some weird, unusual character to separate each message from the others (like |
), and a ;
to separate each item in a comment, I thought to use lists like this:
library(data.table)
dt <- data.table(id=1:2,
comment=list(list(
list(username="michele", date=Sys.time(), message="hello"),
list(username="michele", date=Sys.time(), message="world")),
list(
list(username="michele", date=Sys.time(), message="hello"),
list(username="michele", date=Sys.time(), message="world"))))
> dt
id comment
1: 1 <list>
2: 2 <list>
to store all the comments added for one particular row. (also because it will be easier to convert to JSON
later on when I need to send it back to the UI)
However, when I try to simulate how I will be actually filling my table during production (adding single comment to a particular row), R
either crashes or doesn't assign what I would like and then crashes:
library(data.table)
> library(data.table)
> dt <- data.table(id=1:2, comment=vector(mode="list", length=2))
> dt$comment
[[1]]
NULL
[[2]]
NULL
> dt[1L, comment := 1] # this works
> dt$comment
[[1]]
[1] 1
[[2]]
NULL
> set(dt, 1L, "comment", list(1, "a")) # assign only `1` and when I try to see `dt` R crashes
Warning message:
In set(dt, 1L, "comment", list(1, "a")) :
Supplied 2 items to be assigned to 1 items of column 'comment' (1 unused)
> dt[1L, comment := list(1, "a")] # R crashes as soon as I run
> dt[1L, comment := list(list(1, "a"))] # any of these two
I know I'm trying to misuse data.table
, e.g. the way the j
argument has been designed allows this:
dt[1L, c("id", "comment") := list(1, "a")] # lists in RHS are seen as different columns! not parts of one
Question: So, is there a way to do the assignment I want? Or I just have to take dt$comment
out in a variable, modify it, and then re-assign the whole column every times I need to do an update?
Using
:=
:For the last case, you'll need one more
list
becausedata.table
useslist(.)
to look for values to assign to columns by reference.Using
set
:HTH
I'm using the current development version 1.9.3, but should just work fine on any other version.
Just to add more info, what
list
columns are really designed for is when each cell is itself avector
:Notice the pretty printing of the vectors in the
b
column. Those commas are just for display, each cell is actually a vector (as shown by thesapply
command above). Note also the trailing comma on the 2nd item ofb
. That indicates that the vector is longer than displayed (data.table just displays the first 6 items).Or, more like your example :
What you're trying to do is not only have a
list
column, but putlist
into each cell as well, which is why<list>
is being displayed. Additionally if you place named lists into each cell then beware that all those names will use up space. Where possible, alist
column ofvectors
may be easier.