How to define the subset operators for a S4 class?

2019-01-11 00:01发布

问题:

I am having trouble figuring out the proper way to define the [, $, and [[ subset operators for an S4 class.

Can anyone provide me with a basic example of defining these three for an S4 class?

回答1:

Discover the generic so that we know what we are aiming for

> getGeneric("[")
standardGeneric for "[" defined from package "base"

function (x, i, j, ..., drop = TRUE) 
standardGeneric("[", .Primitive("["))
<bytecode: 0x32e25c8>
<environment: 0x32d7a50>
Methods may be defined for arguments: x, i, j, drop
Use  showMethods("[")  for currently available ones.

Define a simple class

setClass("A", representation=representation(slt="numeric"))

and implement a method

setMethod("[", c("A", "integer", "missing", "ANY"),
    ## we won't support subsetting on j; dispatching on 'drop' doesn't
    ## make sense (to me), so in rebellion we'll quietly ignore it.
    function(x, i, j, ..., drop=TRUE)
{
    ## less clever: update slot, return instance
    ## x@slt = x@slt[i]
    ## x
    ## clever: by default initialize is a copy constructor, too
    initialize(x, slt=x@slt[i])
})

In action:

> a = new("A", slt=1:5)
> a[3:1]
An object of class "A"
Slot "slt":
[1] 3 2 1

There are different strategies for supporting the (implicitly) many signatures, for instance you'd likely also want to support logical and character index values, possibly for both i and j. The most straight-forward is a "facade" pattern where each method does some preliminary coercion to a common type of subset index, e.g., integer to allow for re-ordering and repetition of index entries, and then uses callGeneric to invoke a single method that does the work of subsetting the class.

There are no conceptual differences for [[, other than wanting to respect the semantics of returning the content rather than another instance of the object as implied by [. For $ we have

> getGeneric("$")
standardGeneric for "$" defined from package "base"

function (x, name) 
standardGeneric("$", .Primitive("$"))
<bytecode: 0x31fce40>
<environment: 0x31f12b8>
Methods may be defined for arguments: x
Use  showMethods("$")  for currently available ones.

and

setMethod("$", "A",
    function(x, name)
{
    ## 'name' is a character(1)
    slot(x, name)
})

with

> a$slt
[1] 1 2 3 4 5


回答2:

I would do as @Martin_Morgan suggested for the operators you mentioned. I would add a couple of points though:

1) I would be careful about defining a $ operator to access an S4 slot (unless you intend to access a column from a data frame which is stored in a specific slot?). The general suggestion is to write accessor functions like getMySlot() and setMySlot() to get the information you need. You can use the @ operator to access data from those slots, although get and set are best as a user interface. Using $ could be confusing for the user, who would probably expect a data.frame. See this S4 tutorial by Christophe Genolini for an in-depth discussion of these issues. If this is not how you intended to use $, disregard my suggestion (but the tutorial is still a great resource!).

2) If you are defining [ and [[ to inherit from another class, like vector, you will also want to define el() (equivalent to [][[1L]], or the first element from a subset []) and length(). I am currently writing a class to inherit from numeric, and numeric methods will automatically try to use these functions from your class. If the class is for a more limited or your own personal use, this may not be a problem.

I apologize, I would have left this as a comment, but I'm new to SO and I don't have the rep yet!



标签: oop r subset s4