Check if character value is a valid R object name

2019-01-24 21:34发布

问题:

Several months ago I asked something similar, but I was using JavaScript to check if provided string is a "valid" R object name. Now I'd like to achieve the same by using nothing but R. I suppose that there's a very nice way to do this, with some neat (not so) esoteric R function, so regular expressions seem to me as the last line of defence. Any ideas?

Oh, yeah, using back-ticks and stuff is considered cheating. =)

回答1:

Edited 2013-1-9 to fix regular expression. Previous regular expression, lifted from page 456 of John Chambers' "Software for Data Analysis", was (subtly) incomplete. (h.t. Hadley Wickham)


There are a couple of issues here. A simple regular expression can be used to identify all syntactically valid names --- but some of those names (like if and while) are 'reserved', and cannot be assigned to.

  • Identifying syntactically valid names:

?make.names explains that a syntactically valid name:

[...] consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number. Names such as '".2way"' are not valid [...]

Here is the corresponding regular expression:

  "^([[:alpha:]]|[.][._[:alpha:]])[._[:alnum:]]*$"
  • Identifying unreserved syntactically valid names

To identify unreserved names, you can take advantage of the base function make.names(), which constructs syntactically valid names from arbitrary character strings.

    isValidAndUnreserved <- function(string) {
        make.names(string) == string
    }

    isValidAndUnreserved(".jjj")
    # [1] TRUE
    isValidAndUnreserved(" jjj")
    # [1] FALSE
  • Putting it all together

    isValidName <- function(string) {
        grepl("^([[:alpha:]]|[.][._[:alpha:]])[._[:alnum:]]*$", string)
    }
    
    isValidAndUnreservedName <- function(string) {
        make.names(string) == string
    }
    
    testValidity <- function(string) {
        valid <- isValidName(string)
        unreserved <- isValidAndUnreservedName(string)
        reserved <- (valid & ! unreserved)
        list("Valid"=valid,
             "Unreserved"=unreserved,
             "Reserved"=reserved)
    }
    
    testNames <- c("mean", ".j_j", "...", "if", "while", "TRUE", "NULL",
                   "_jj", "  j", ".2way") 
    t(sapply(testNames, testValidity))
    
          Valid Unreserved Reserved
    mean  TRUE  TRUE       FALSE   
    .j_j  TRUE  TRUE       FALSE   
    ...   TRUE  TRUE       FALSE   
    if    TRUE  FALSE      TRUE    
    while TRUE  FALSE      TRUE    
    TRUE  TRUE  FALSE      TRUE    
    NULL  TRUE  FALSE      TRUE    
    _jj   FALSE FALSE      FALSE   
      j   FALSE FALSE      FALSE   # Note: these tests are for "  j", not "j"
    .2way FALSE FALSE      FALSE  
    

For more discussion of these issues, see the r-devel thread linked to by @Hadley in the comments below.



回答2:

As Josh suggests, make.names is probably the best solution to this. Not only will it handle weird punctuation, it'll also flag reserved words:

make.names(".x")   # ".x"
make.names("_x")   # "X_x"
make.names("if")   # " if."
make.names("function")  # "function."