How to access actual internal factor lookup hashta

2019-08-10 06:04发布

问题:

Dear Stackoverflow community,

I have looked everywhere but can't find the answer to this question. I am trying to access the factor lookup table that R uses when you change a string vector into a factor vector. I am not trying to convert a string to a factor but rather to get the lookup table underlying the factor variable and store it as a hash table for use elsewhere.

I encountered the problem because I want to use this factor lookup table on a list of different length vectors, to convert them from strings to numbers.

i.e., I have a list of item sets that I want to convert to numeric, but each set in the list has a different number of items.

So far, I have converted the list of vectors into a vector

vec <- unlist(list)
vec <- factor(vec)

Now I want to do a lookup on the original list with the factor lookup table which must be underlying vec, but I can't seem to find it.

回答1:

I think you either want the indexes which map the elements of the factor to elements of the factor levels, as in:

vec <- c('a','b','c','b','a')
f <- factor(vec)
f
#> [1] a b c b a
#> Levels: a b c

indx <- (f)
attributes(indx) <- NULL
indx
#> [1] 1 2 3 2 1

or you want the hash tables used internally to create the factor variable. Unfortunately, any hash tables created in the process of creating a factor, would be created by the functions unique and match which are internal functions, so you won't have access to anything those functions create (other than the return value of course). If you want a hash table so you can use it to index a character vector with the same levels as your existing factor, just create a hash table, as in:

library(hash)
.levels <- levels(f)
h <- hash(keys = .levels,values = seq_along(.levels))
newVec <- sample(.levels,10,replace=T)
newVec
#> [1] "a" "b" "a" "a" "a" "c" "c" "b" "c" "a"
values(h,keys = newVec)
#> a b a a a c c b c a 
#> 1 2 1 1 1 3 3 2 3 1