Why, for an integer vector x, does as(x, “numeric”

2019-02-03 23:27发布

问题:

While my question is related to this recent one, I suspect its answer(s) will have to do with the detailed workings of R's S4 object system.

What I would expect:

(TLDR; -- All indications are that as(4L, "numeric") should dispatch to a function whose body uses as.numeric(4L) to convert it to a "numeric" vector.)

Whenever one uses as(object, Class) to convert an object to the desired Class, one is really triggering a behind-the-scenes call to coerce(). coerce(), in turn, has a bunch of methods that are dispatched to based on the signature of the function call -- here the class of its first and second arguments. To see a list of all available S4 coerce() methods, one can run showMethods("coerce").

Doing so shows that there is only one method for converting to class "numeric". It's the one with signature from="ANY", to="numeric":

showMethods("coerce")
# Function: coerce (package methods)
# from="ANY", to="array"
#      ... snip ... 
# from="ANY", to="numeric"
#      ... snip ...

That method uses as.numeric() to perform its conversion:

getMethod("coerce", c("ANY", "numeric"))
# Method Definition:
# 
# function (from, to, strict = TRUE) 
# {
#     value <- as.numeric(from)
#     if (strict) 
#         attributes(value) <- NULL
#     value
# }
# <environment: namespace:methods>
# 
# Signatures:
#         from  to       
# target  "ANY" "numeric"
# defined "ANY" "numeric"

Given its signature, and the fact that it's the only coerce() method for conversion to class "numeric", I would've expected that the function shown above is what would be dispatched to by a call to as(4L, "numeric"). That expectation is only reinforced by running the following two checks.

## (1) There isn't (apparently!) any specific method for "integer"-->"numeric"
##     conversion
getMethod("coerce", c("integer", "numeric"))
# Error in getMethod("coerce", c("integer", "numeric")) : 
#   no method found for function 'coerce' and signature integer, numeric

## (2) This says that the "ANY"-->"numeric" method will be used for "integer"-->"numeric"
##     conversion    
selectMethod("coerce",  signature=c("integer", "numeric"))
# Method Definition:
# 
# function (from, to, strict = TRUE) 
# {
#     value <- as.numeric(from)
#     if (strict) 
#         attributes(value) <- NULL
#     value
# }
# <environment: namespace:methods>
# 
# Signatures:
#         from      to       
# target  "integer" "numeric"
# defined "ANY"     "numeric"

What actually happens:

(TLDR; In fact, calling as(4L, "numeric") loads and dispatches to a method that does nothing at all.)

Despite what all indications mentioned above, as(4L, "numeric") does not dispatch to the coerce() method for calls with signature c("ANY", "numeric").

Here are a couple of ways to show that:

## (1) as.numeric() would do the job, but as(..., "numeric") does not
class(as(4L, "numeric"))
#[1] "integer"
class(as.numeric(4L))
# [1] "numeric"

## (2) Tracing shows that the "generic" method isn't called
trace("coerce", signature=c("ANY", "numeric"))

as(c(FALSE, TRUE), "numeric")        ## <-- It's called for "logical" vectors
# Tracing asMethod(object) on entry   
# [1] 0 1

as(c("1", "2"), "numeric")           ## <-- and for "character" vectors
# Tracing asMethod(object) on entry   
# [1] 1 2    

as(c(1L, 2L), "numeric")             ## <-- but not for "integer" vectors 
# [1] 1 2

untrace("coerce")

What method, then, is used? Well, apparently the act of calling as(4L, "numeric") adds a new S4 method to the list of methods for coerce(), and it's what is used.
(Compare the results of the following calls to what they produced before we had attempted our first "integer" to "character" conversion.)

## At least one conversion needs to be attempted before the  
## "integer"-->"numeric" method appears.
as(4L, "numeric")  

## (1) Now the methods table shows a new "integer"-->"numeric" specific method   
showMethods("coerce")    
# Function: coerce (package methods)
# from="ANY", to="array"
#      ... snip ... 
# from="ANY", to="numeric"
#      ... snip ...
# from="integer", to="numeric"        ## <-- Here's the new method
#      ... snip ...

## (2) selectMethod now tells a different story
selectMethod("coerce",  signature=c("integer", "numeric"))
# Method Definition:
# 
# function (from, to = "numeric", strict = TRUE) 
# if (strict) {
#     class(from) <- "numeric"
#     from
# } else from
# <environment: namespace:methods>
# 
# Signatures:
#         from      to       
# target  "integer" "numeric"
# defined "integer" "numeric"

My questions:

  1. Why does as(4L, "numeric") not dispatch to the available coerce() method for signature=c("ANY", "numeric")?

  2. How/why does it instead add a new method to the S4 methods table?

  3. From where (in R's source code or elsewhere) does the definition of the coerce() method for signature=c("integer", "numeric") come?

回答1:

I'm not sure whether I can answer your question exhaustively, but I'll try.

The help of the as() function states:

The function ‘as’ turns ‘object’ into an object of class ‘Class’. In doing so, it applies a “coerce method”, using S4 classes and methods, but in a somewhat special way.

[...]

Assuming the ‘object’ is not already of the desired class, ‘as’ first looks for a method in the table of methods for the function 'coerce’ for the signature ‘c(from = class(object), to = Class)’, in the same way method selection would do its initial lookup.

[...]

If no method is found, ‘as’ looks for one. First, if either ‘Class’ or ‘class(object)’ is a superclass of the other, the class definition will contain the information needed to construct a coerce method. In the usual case that the subclass contains the superclass (i.e., has all its slots), the method is constructed either by extracting or replacing the inherited slots.

This is exactly what you can see if you look at the code of the as() function (to see it, type as (without the parentheses!) to the R console) - see below. First it looks for an asMethod, if it can't find any it tries to construct one, and finally at the end it executes it:

if (strict) 
    asMethod(object)
else asMethod(object, strict = FALSE)

When you copy-paste the code of the as() function and define your own function - let's call it myas() - your can insert a print(asMethod) above the if (strict) just mentioned to get the function used for coercing. In this case the output is:

> myas(4L, 'numeric')
function (from, to = "numeric", strict = TRUE) 
if (strict) {
    class(from) <- "numeric"
    from
} else from
<environment: namespace:methods>
attr(,"target")
An object of class “signature”
     from        to 
"integer" "numeric" 
attr(,"defined")
An object of class “signature”
     from        to 
"integer" "numeric" 
attr(,"generic")
[1] "coerce"
attr(,"generic")attr(,"package")
[1] "methods"
attr(,"class")
[1] "MethodDefinition"
attr(,"class")attr(,"package")
[1] "methods"
attr(,"source")
[1] "function (from, to = \"numeric\", strict = TRUE) "
[2] "if (strict) {"                                    
[3] "    class(from) <- \"numeric\""                   
[4] "    from"                                         
[5] "} else from"                                      
[1] 4

So, as you can see (look at attr(,"source")), the as(4L, 'numeric') simply assigns class numeric to the input object and returns it. Thus, the following two snippets are equivalent (for this case!):

> # Snippet 1
> x = 4L
> x = as(x, 'numeric')

> # Snippet 2
> x = 4L
> class(x) <- 'numeric'

Interestingly, both to 'nothing'. More interestingly, the other way round it works:

> x = 4
> class(x)
[1] "numeric"
> class(x) <- 'integer'
> class(x)
[1] "integer"

I'm not exactly sure about this (as the class method seems to be implemented in C) - but my guess would be that when assigning class numeric, it first checks whether it is already numeric. Which could be the case as integer is numeric (although not double) - see also the "historical anomaly" quote below:

> x = 4L
> class(x)
[1] "integer"
> is.numeric(x)
[1] TRUE

Regarding as.numeric: This is a generic method and calls as.double(), which is why it 'works' (from the R help on as.numeric):

It is a historical anomaly that R has two names for its floating-point vectors, ‘double’ and ‘numeric’ (and formerly had ‘real’).

‘double’ is the name of the type. ‘numeric’ is the name of the mode and also of the implicit class.

Regarding questions (1) - (3): The magic happens in those four lines at the top of the as function:

where <- .classEnv(thisClass, mustFind = FALSE)
coerceFun <- getGeneric("coerce", where = where)
coerceMethods <- .getMethodsTable(coerceFun, environment(coerceFun), inherited = TRUE)
asMethod <- .quickCoerceSelect(thisClass, Class, coerceFun, coerceMethods, where)

Im lacking the time to dig into there, sorry.

Hope that helps.



回答2:

Looking at the source code for as(), it has two parts. (The source code has been shortened for clarity). First, it looks for existing methods for coerce(), as you described above.

function (object, Class, strict = TRUE, ext = possibleExtends(thisClass, 
    Class)) 
{
    thisClass <- .class1(object)
    where <- .classEnv(thisClass, mustFind = FALSE)
    coerceFun <- getGeneric("coerce", where = where)
    coerceMethods <- .getMethodsTable(coerceFun, environment(coerceFun), 
        inherited = TRUE)
    asMethod <- .quickCoerceSelect(thisClass, Class, coerceFun, 
        coerceMethods, where)

    # No matching signatures from the coerce table!!!
    if (is.null(asMethod)) {
        sig <- c(from = thisClass, to = Class)
        asMethod <- selectMethod("coerce", sig, optional = TRUE, 
            useInherited = FALSE, fdef = coerceFun, mlist = getMethodsForDispatch(coerceFun))

If it doesn't find any methods, as in this case, then it attempts to create a new method as follows:

        if (is.null(asMethod)) {
            canCache <- TRUE
            inherited <- FALSE

            # The integer vector is numeric!!!
            if (is(object, Class)) {
                ClassDef <- getClassDef(Class, where)
                if (identical(ext, FALSE)) {}
                else if (identical(ext, TRUE)) {}
                else {
                  test <- ext@test

                  # Create S4 coercion method here
                  asMethod <- .makeAsMethod(ext@coerce, ext@simple, 
                    Class, ClassDef, where)
                  canCache <- (!is(test, "function")) || identical(body(test), 
                    TRUE)
                }
            }
            if (is.null(asMethod)) {}
            else if (canCache) 
                asMethod <- .asCoerceMethod(asMethod, thisClass, 
                  ClassDef, FALSE, where)
            if (is.null(asMethod)) {}
            else if (canCache) {
                cacheMethod("coerce", sig, asMethod, fdef = coerceFun, 
                  inherited = inherited)
            }
        }
    }

    # Use newly created method on object here
    if (strict) 
        asMethod(object)
    else asMethod(object, strict = FALSE)

By the way, if you're only dealing with the basic atomic types, I would stick to base functions and avoid the methods package; the only reason to use methods is dealing with S4 objects.



标签: r s4