I am trying to package some of my Python code that calls R code using rpy2. That R code currently sits in a separate file which I source
from the Python script. For example, if the python script is myscript.py
, then the R code is stored in myscript_support.R
, and I have something like the following in myscript.py
:
from rpy2.robjects import *
# Load the R code
r.source(os.path.join(os.path.dirname(__file__), "myscript_support.R"))
# Call the R function
r[["myscript_R_function"]]()
I now want to package this Python script using setuptools, and I have a few questions:
How should I package the R support code, and once I have done so, how do I find the path to the R file so I can source it?
The R code depends on several R packages. How can I ensure that these are installed? Should I just raise an informative error if these R packages cannot be loaded?
For the source files to be installed, you need to specify them in some way in
package_data
. You can find their path in the exact same way as you do now.Either make
setup.py
check if they exist (kind of "configtools approach") or just raise some kind of exception once you cannot load them. Or maybe do both of them, and then if for some reason the files you depend on disappear, at least you will know it.This question might be dated, but I ran into the same issue today and wanted to provide more detail for the question 1 solution suggested by @ivan_pozdeev and a new solution for question 2.
1) Edit your setup.py file to:
2) Conda is quickly becoming a good option for dealing with package dependencies across both python and R. You can create an environment (http://conda.pydata.org/docs/using/envs), download all the r and python packages that you might need, and then generate an environment.yml file so that anyone can replicate your environment. Check out this blog for more info: https://www.continuum.io/content/conda-data-science
Well, imagine yourself as the setuptools packager and think of what you would expect the programmer to do.
For the first problem, you have two choices:
The first option is implementable by passing
include_package_data = True
tosetup()
and providing masks of files to include inpackage_data
(setuptools docs, "Including Data Files" section). Paths relative to packages' directories can be used. The files will be accessible at run time at the same relative paths through the "Resource Management API" ("Accessing Data Files at Runtime" section).The second option would require you to add your code to setuptools before invoking
setup()
. For example, you may add a file finder to add relevant .R files to the results offind_packages()
. Or just generate the list of files for the previous paragraph by arbitrary means.For the second problem, the easiest way is to force setuptools to install the package as a directory rather than an .egg by specifying
zip_safe = False
. You might useeager_resources
option instead that extracts a group of resources on demand ("Automatic Resource Extraction" section).As for installing third-party R packages, an automatable technique is described at R Installation and Administration - Installing packages