Question:
I'm working in R. I want the shared columns of 2 data.tables (shared meaning same column name) to have matching classes. I'm struggling with a way to generically convert an object of unknown class to the unknown class of another object.
More context:
I know how to set the class of a column in a data.table, and I know about the as
function. Also, this question isn't entirely data.table
specific, but it comes up often when I use data.table
s. Further, assume that the desired coercion is possible.
I have 2 data.tables. They share some column names, and those columns are intended to represent the same information. For the column names shared by table A and table B, I want the classes of A to match those in B (or other way around).
Example data.table
s:
A <- structure(list(year = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), stratum = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L)), .Names = c("year", "stratum"), row.names = c(NA, -45L), class = c("data.table", "data.frame"))
B <- structure(list(year = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3), stratum = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L), bt = c(-9.95187702337873, -9.48946944434626, -9.74178662514147, -5.36167545158338, -4.76405522202426, -5.41964239804882, -0.0807951335119085, 0.520481719699774, 0.0393874225863578, 5.40557402913123, 5.47927931969583, 5.37228402911139, 9.82774396910091, 9.89629694010177, 9.98105260936272, -9.82469892896284, -9.42530210357904, -9.66171049964775, -5.17540952901709, -4.81859082470115, -5.3577146169737, -0.0685310909609001, 0.441383303157166, -0.0105897444321987, 5.24205882775199, 5.65773605162835, 5.40217185632441, 9.90299445851434, 9.78883672575814, 9.98747998379124, -9.69843398105195, -9.31530717395811, -9.77406601252698, -4.83080164375344, -4.89056304189872, -5.3904000267275, -0.121508487954861, 0.493798577602088, -0.118550709142654, 5.23654772583187, 5.87760447006892, 5.22478092346285, 9.90949768116403, 9.85433376398086, 9.91619307289277), yr = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)), .Names = c("year", "stratum", "bt", "yr"), row.names = c(NA, -45L), class = c("data.table", "data.frame"), sorted = c("year", "stratum"))
Here's what they look like:
> A
year stratum
1: 1 1
2: 1 2
3: 1 3
4: 1 4
> B
year stratum bt yr
1: 1 1 -9.95187702 1
2: 1 2 -9.48946944 1
3: 1 3 -9.74178663 1
4: 1 4 -5.36167545 1
Here are the classes:
> sapply(A, class)
year stratum
"integer" "integer"
> sapply(B, class)
year stratum bt yr
"numeric" "integer" "numeric" "numeric"
Manually, I can accomplish the desired task through the following:
A[,year:=as.numeric(year)]
This is easy when there's only 1 column to change, you know that column ahead of time, and you know the desired class ahead of time. If desired, it's also pretty easy to to convert arbitrary columns to a given class. I also know how to convert arbitrary columns to any given class.
My Failed Attempt:
(EDIT: This actually works; see my answer)
s2c <- function (x, type = "list")
{
as.call(lapply(c(type, x), as.symbol))
}
# In this case, I can assume all columns of A can be found in B
# I am also able to assume that the desired conversion is possible
B.class <- sapply(B[,eval(s2c(names(A)))], class)
for(col in names(A)){
set(A, j=col, value=as(A[[col]], B.class[col]))
}
But this still returns the year column as "integer"
, not "numeric"
:
> sapply(A, class)
year stratum
"integer" "integer"
The problem in the above example is that class(as(1L, "numeric"))
still returns "integer"
. On the other hand, class(as.numeric(1L))
returns "numeric"
; however, I don't know ahead of time that need as.numeric
is needed.
Question, Restated:
How do I make the column classes match, when neither columns nor the to
/from
classes are known ahead of time?
Additional Thoughts:
In a way, the question is mostly about arbitrary class matching. I run into this issue often with data.table because it's very vocal about class matching. E.g., I run into similar problems when needed to insert NA
of the appropriate type (NA_real_
vs NA_character_
, etc), depending on the class of the column (see related question/ issue in This Question).
Again, this question can be seen as a general issue of converting between arbitrary classes that aren't known in advance. In the past, I've written functions using switch
to do something like switch(class(x), double = as.numeric(...), character = as.character(...), ...
, but that seems a big ugly. The only reason I'm bringing this up in the context of data.table is because it's where I most often encounter the need for this type of functionality.