Including a “Hash Table” in a package

2019-05-25 19:23发布

问题:

I am in the process of putting together a package I've been working on for almost a year now. I have what I call a hash table that a syllable look up function requires. The hash table is really just an environment (I think I'm not computer whiz) that's a look up table. You can see the function I create it with below. I have a data set DICTIONARY(about 20,000 words) that will load when the package is loaded. I also what this DICTIONARY to be passed to the hash function to create a new environment when the package is loaded; something like env <- hash(DICTIONARY) as htis is how I load the environment now. How do I make a function run on start up when the package is loaded so that this new environment is created for those using my package?

hash <- function(x, type = "character") {
    e <- new.env(hash = TRUE, size = nrow(x), parent = emptyenv())
    char <- function(col) assign(col[1], as.character(col[2]), envir = e)
    num <- function(col) assign(col[1], as.numeric(col[2]), envir = e)
    FUN <- if(type=="character") char else num
    apply(x, 1, FUN)
    return(e)
}

#currently how I load the environment with the DICTIONARY lookup table
env <- hash(DICTIONARY) 

Here's the head of DICTIONARY if it's helpful:

    word syllables
1     hm         1
2    hmm         1
3   hmmm         1
4   hmph         1
5  mmhmm         2
6   mmhm         2
7     mm         1
8    mmm         1
9   mmmm         1
10   pff         1

Many of you may be thinking "This is up to the user to determine if they want the environment loaded". Valid point but the intended audience of this package is people in the literacy field. Not many in that field are R users and so I have to make this thing as easy as possible to use. Just wanted to get out the philosophy of why I want to do this, out there so that it doesn't become a point of contention.

Thank you in advance. (PS I've looked at this manual (LINK) but can't seem to locate any info about this topic)

EDIT: Per Andrei's suggestion i think it will be something like this? But I'm not sure. Does this load after all the other functions and data sets in the package load? This stuff is a little confusing to me.

.onLoad <- function(){
   env <- hash(DICTIONARY)
}

回答1:

If the hash is going to change infrequently (this seems like the case, from your problem description), then save the hash into your package source tree as

save(env, file="<my_pkg>/R/sysdata.rda")

After installing the package, env will be available inside the name space, my_pkg:::env. See section 1.1.3 of "Writing R Extensions". You might have a script, say in "/inst/scripts/make_env.R" that creates env, and that you as the developer use on those rare occasions when env needs to be updated.

Another possibility is that the hash changes, but only on package installation. Then the solution is to write code that is evaluated at package installation. So in a file /R/env.R write something along the lines of

env <- local({
    localenv <- new.env(parent=emptyenv())
    ## fill up localenv, then return it
    localenv[["foo"]] = "bar"
    localenv
})

The possibility solved by .onLoad is that the data changes each time the package is loaded, e.g., because it is retrieving an update from some on-line source.

env <- new.env(parent=emptyenv())

.onLoad <- function(libname, pkgname)
{
    ## fill up env
    env[["foo"]] = "bar"
}


标签: r package