What object to pass to R from rpy2?

2019-07-22 01:28发布

I'm unable to make the following code work, though I don't see this error working strictly in R.

from rpy2.robjects.packages import importr
from rpy2 import robjects
import numpy as np

forecast = importr('forecast')
ts = robjects.r['ts']

y = np.random.randn(50)
X = np.random.randn(50)

y = ts(robjects.FloatVector(y), start=robjects.IntVector((2004, 1)), frequency=12)
X = ts(robjects.FloatVector(X), start=robjects.IntVector((2004, 1)), frequency=12)

forecast.Arima(y, xreg=X, order=robjects.IntVector((1, 0, 0)))

It's especially confusing considering the following code works fine

forecast.auto_arima(y, xreg=X)

I see the following traceback no matter what I give for X, using numpy interface or not. Any ideas?

---------------------------------------------------------------------------
RRuntimeError                             Traceback (most recent call last)
<ipython-input-20-b781220efb93> in <module>()
     13 X = ts(robjects.FloatVector(X), start=robjects.IntVector((2004, 1)),  frequency=12)
     14 
---> 15 forecast.Arima(y, xreg=X, order=robjects.IntVector((1, 0, 0)))

/home/skipper/.local/lib/python2.7/site-packages/rpy2/robjects/functions.pyc in     __call__(self, *args, **kwargs)
    84                 v = kwargs.pop(k)
    85                 kwargs[r_k] = v
---> 86         return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)

/home/skipper/.local/lib/python2.7/site-packages/rpy2/robjects/functions.pyc in __call__(self, *args, **kwargs)
    33         for k, v in kwargs.iteritems():
    34             new_kwargs[k] = conversion.py2ri(v)
---> 35         res = super(Function, self).__call__(*new_args, **new_kwargs)
    36         res = conversion.ri2py(res)
    37         return res

RRuntimeError: Error in `colnames<-`(`*tmp*`, value = if (ncol(xreg) == 1) nmxreg else    paste(nmxreg,  : 
length of 'dimnames' [2] not equal to array extent

Edit:

The problem is that the following lines of code do not evaluate to a column name, which seems to be the expectation on the R side.

sub = robjects.r['substitute']
deparse = robjects.r['deparse']
deparse(sub(X))

I don't know well enough what the expectations of this code should be in R, but I can't find an RPy2 object that passes this check by returning something of length == 1. This really looks like a bug to me.

R> length(deparse(substitute((rep(.2, 1000)))))
[1] 1

But in Rpy2

[~/]
[94]: robjects.r.length(robjects.r.deparse(robjects.r.substitute(robjects.r('rep(.2,     1000)'))))
[94]: 
<IntVector - Python:0x7ce1560 / R:0x80adc28>
[      78]

标签: python r rpy2
2条回答
戒情不戒烟
2楼-- · 2019-07-22 01:30

there is a way to just simply pass your variables to R without sub-situations and return the results back to python. You can find a simple example here https://stackoverflow.com/a/55900840/5350311 . I guess it is more clear what you are passing to R and what you will get back in return, specially if you are working with For loops and large number of variables.

查看更多
再贱就再见
3楼-- · 2019-07-22 01:33

This is one manifestation (see this other related issue for example) of the same underlying issue: R expressions are evaluated lazily and can be manipulated within R and this leads to idioms that do not translate well (in Python expression are evaluated immediately, and one has to move to the AST to manipulate code).

An answers to the second part of your question. In R, substitute(rep(.2, 1000)) is passing the unevaluated expression rep(.2, 1000) to substitute(). Doing in rpy2

substitute('rep(.2, 1000)')`

is passing a string; the R equivalent would be

substitute("rep(.2, 1000)")

The following is letting you get close to R's deparse(substitute()):

from rpy2.robjects.packages import importr
base = importr('base')
from rpy2 import rinterface

# expression
e = rinterface.parse('rep(.2, 1000)')
dse = base.deparse(base.substitute(e))

>>> len(dse)
1
>>> print(dse) # not identical to R
"expression(rep(0.2, 1000))"

Currently, one way to work about this is to bind R objects to R symbols (preferably in a dedicated environment rather than in GlobalEnv), and use the symbols in an R call written as a string:

from rpy2.robjects import Environment, reval

env = Environment()
for k,v in (('y', y), ('xreg', X), ('order', robjects.IntVector((1, 0, 0)))):
    env[k] = v

# make an expression
expr = rinterface.parse("forecast.Arima(y, xreg=X, order=order)")
# evaluate in the environment
res = reval(expr, envir=env)

This is not something I am happy about as a solution, but I have never found the time to work on a better solution.

edit: With rpy2-2.4.0 it becomes possible to use R symbols and do the following:

RSymbol = robjects.rinterface.SexpSymbol
pairlist = (('x', RSymbol('y')),
            ('xreg', RSymbol('xreg')),
            ('order', RSymbol('order')))
res = forecast.Arima.rcall(pairlist,
                           env)

This is not yet the most intuitive interface. May be something using a context manager would be better.

查看更多
登录 后发表回答