So this is kind of a trivial question, but it's bugging me that I can't answer it, and perhaps the answer will teach me some more details about how R works.
The title says it all: how does R parse ->
, the obscure right-side assignment function?
My usual tricks to dive into this failed:
`->`
Error: object
->
not found
getAnywhere("->")
no object named
->
was found
And we can't call it directly:
`->`(3,x)
Error: could not find function
"->"
But of course, it works:
(3 -> x) #assigns the value 3 to the name x
# [1] 3
It appears R knows how to simply reverse the arguments, but I thought the above approaches would surely have cracked the case:
pryr::ast(3 -> y)
# \- ()
# \- `<- #R interpreter clearly flipped things around
# \- `y # (by the time it gets to `ast`, at least...)
# \- 3 # (note: this is because `substitute(3 -> y)`
# # already returns the reversed version)
Compare this to the regular assignment operator:
`<-`
.Primitive("<-")
`<-`(x, 3) #assigns the value 3 to the name x, as expected
?"->"
, ?assignOps
, and the R Language Definition all simply mention it in passing as the right assignment operator.
But there's clearly something unique about how ->
is used. It's not a function/operator (as the calls to getAnywhere
and directly to `->`
seem to demonstrate), so what is it? Is it completely in a class of its own?
Is there anything to learn from this besides "->
is completely unique within the R language in how it's interpreted and handled; memorize and move on"?
Let me preface this by saying I know absolutely nothing about how parsers work. Having said that, line 296 of gram.y defines the following tokens to represent assignment in the (YACC?) parser R uses:
Then, on lines 5140 through 5150 of gram.c, this looks like the corresponding C code:
Finally, starting on line 5044 of gram.c, the definition of
install_and_save2
:So again, having zero experience working with parsers, it seems that
->
and->>
are translated directly into<-
and<<-
, respectively, at a very low level in the interpretation process.You brought up a very good point in asking how the parser "knows" to reverse the arguments to
->
- considering that->
appears to be installed into the R symbol table as<-
- and thus be able to correctly interpretx -> y
asy <- x
and notx <- y
. The best I can do is provide further speculation as I continue to come across "evidence" to support my claims. Hopefully some merciful YACC expert will stumble on this question and provide a little insight; I'm not going to hold my breath on that, though.Back to lines 383 and 384 of gram.y, this looks like some more parsing logic related to the aforementioned
LEFT_ASSIGN
andRIGHT_ASSIGN
symbols:Although I can't really make heads or tails of this crazy syntax, I did notice that the second and third arguments to
xxbinary
are swapped to WRTLEFT_ASSIGN
(xxbinary($2,$1,$3)
) andRIGHT_ASSIGN
(xxbinary($2,$3,$1)
).Here's what I'm picturing in my head:
LEFT_ASSIGN
Scenario:y <- x
$2
is the second "argument" to the parser in the above expression, i.e.<-
$1
is the first; namelyy
$3
is the third;x
Therefore, the resulting (C?) call would be
xxbinary(<-, y, x)
.Applying this logic to
RIGHT_ASSIGN
, i.e.x -> y
, combined with my earlier conjecture about<-
and->
getting swapped,$2
gets translated from->
to<-
$1
isx
$3
isy
But since the result is
xxbinary($2,$3,$1)
instead ofxxbinary($2,$1,$3)
, the result is stillxxbinary(<-, y, x)
.Building off of this a little further, we have the definition of
xxbinary
on line 3310 of gram.c:Unfortunately I could not find a proper definition of
lang3
(or its variantslang1
,lang2
, etc...) in the R source code, but I'm assuming that it is used for evaluating special functions (i.e. symbols) in a way that is synchronized with the interpreter.Updates I'll try to address some of your additional questions in the comments as best I can given my (very) limited knowledge of the parsing process.
First, I agree that this lies outside of that domain. I believe Chambers' quote concerns the R Environment, i.e. processes that are all taking place after this low level parsing phase. I'll touch on this a little bit more below, however. Anyways, the only other example of this sort of behavior I could find is the
**
operator, which is a synonym for the more common exponentiation operator^
. As with right assignment,**
doesn't seem to be "recognized" as a function call, etc... by the interpreter:I found this because it's the only other case where
install_and_save2
is used by the C parser:Of course I'm still speculating here, but yes, I think we can safely assume that when you call
substitute(3 -> y)
, from the perspective of the substitute function, the expression always wasy <- 3
; e.g. the function is completely unaware that you typed3 -> y
.do_substitute
, like 99% of the C functions used by R, only handlesSEXP
arguments - anEXPRSXP
in the case of3 -> y
(==y <- 3
), I believe. This is what I was alluding to above when I made a distinction between the R Environment and the parsing process. I don't think there is anything that specifically triggers the parser to spring into action - but rather everything you input into the interpreter gets parsed. I did a little more reading about the YACC / Bison parser generator last night, and as I understand it (a.k.a. don't bet the farm on this), Bison uses the grammar you define (in the.y
file(s)) to generate a parser in C - i.e. a C function which does the actual parsing of input. In turn, everything you input in an R session is first processed by this C parsing function, which then delegates the appropriate action to be taken in the R Environment (I'm using this term very loosely by the way). During this phase,lhs -> rhs
will get translated torhs <- lhs
,**
to^
, etc... For example, this is an excerpt from one of the tables of primitive functions in names.c:You will notice that
->
,->>
, and**
are not defined here. As far as I know, R primitive expressions such as<-
and[
, etc... are the closest interaction the R Environment ever has with any underlying C code. What I am suggesting is that by this stage in process (from you typing a set characters into the interpreter and hitting 'Enter', up through the actual evaluation of a valid R expression), the parser has already worked its magic, which is why you can't get a function definition for->
or**
by surrounding them with backticks, as you typically can.