Creating Classes in R: S3, S4, R5 (RC), or R6? [cl

2019-01-31 11:40发布

问题:

I'm kind of posting an open question to look for advantages or disadvantages to creating new classes in R. From what I can tell, there's essentially four different paradigms that are used when creating classes in R, S3, S4, R5 (or RC), and R6.

S3 is what most of the core libraries of R use, and there seems to be some merit to sticking with the simple, lightly structured pattern of using generic method dispatching. I'd like to avoid using this for other reasons that aren't quite clear to me, such as encapsulation, definitions of methods, etc. For example, it seems rather cumbersome when structuring a class, because generic methods are defined outside of the class, and things of that nature.

S4 doesn't seem to be any better, but it does have a poor-man's notion of type safety involved with it, to make obvious mistakes that might arise more apparent. However, I still feel like the S4 classes are hard to maintain in the sense that I'm unsure about things such as encapsulation and such that are involved with these classes. Another thing that seems to be confusing me is that there is little to no notion of namespacing.

R5 seems to be a little more akin to what I'm used to, where the definitions of methods are bound to classes, rather than dispatching functions. Here, there's a little bit more thought that comes into play with organizing an object in terms of a class that I would be used to. One possible disadvantage is that R5 is also built off of S4.

R6 seems to be a rewrite of R5 by an individual that adds more OOP features to the mix, such as private and public functions and properties, but I can hardly find any support for these classes otherwise, as information about them seems to be sparse through Google searching.

As you can tell, I'm struggling with the OO concepts in R and I can't seem to figure out the following facets that are normally associated with OOP:

  1. Type Safety / Types
  2. Method / Object binding, encapsulation, member variables, etc,
  3. Namespacing and organizing code
  4. Versions of Inheritance.

I'm wondering if someone can provide an answer that can describe what the preferred class system is in the R community, and how to best think about when to use classes.

回答1:

It seems you are already aware of some of the definitions and uses for the various OOP types. I will give my opinion on when it is appropriate to use which.

  1. Use S3 classes for situations where both of the following apply: (a) your object is static and not self-modifying, and (b) you do not care about multi-argument method signatures, i.e., your method dispatches purely on its first argument, the S3 class of the object. Additionally, S3 classes are a good solution when you can live with these restrictions and want to overload many operators.

  2. Use S4 classes if your object is static and not self-modifying, but you care about multi-argument method signatures. From my experience, S4 OOP has always been more hassle than it is worth, although it "guarantees" type safety to some extent.

  3. Use reference classes if your object is self-modifying. Otherwise, you will have to define many replace methods (e.g., some_method<-, which is called with the syntax some_method(obj) <- value). This is awkward and computationally slow, since R will be creating a full copy of the object each time. R6 is a good substitute, although I have not found it necessary for my purposes.

Most people new to R think it is confused; that the reason there are so many OOP implementations is because there was no consensus.

This is incorrect.

Due to its statistical nature, most heterogeneous structures in R (i.e, things that should be objecty) end up being the result of a statistical algorithm: an lm, glmnet, gbm, etc. object. It usually suffices to bundle this information and provide the expected interfaces for summarizing it: print, summary, etc.

Owing to its legacy as a statistical playground, this frees the user from having to think about more advanced concepts like inheritance and allocation / de-allocation, and opens the playing field to more contributors. This means that it is slightly more annoying to create complex projects (e.g., web servers, text parsers, graphical interfaces, etc.) in R than in a typical object-driven language like Ruby, but the lack of a uniform OOP-type is balanced by ease of use.

One final way to think about it is that the different approaches are like phase transitions in matter: solid, gas, liquid. Rather than treating all heterogeneous structures (i.e., OOP-like things) uniformly, some fall more naturally under one structure than another. If I am wrapping a simple list in an S3 class to display nicely with an overloaded print method, it would be rather silly to set up a whole reference class for this purpose.



标签: r oop