Dependency management in R

2019-03-11 09:41发布

问题:

Does R have a dependency management tool to facilitate project-specific dependencies? I'm looking for something akin to Java's maven, Ruby's bundler, Python's virtualenv, Node's npm, etc.

I'm aware of the "Depends" clause in the DESCRIPTION file, as well as the R_LIBS facility, but these don't seem to work in concert to provide a solution to some very common workflows.

I'd essentially like to be able to check out a project and run a single command to build and test the project. The command should install any required packages into a project-specific library without affecting the global R installation. E.g.:

my_project/.Rlibs/*

回答1:

Unfortunately, Depends: within the DESCRIPTION: file is all you get for the following reasons:

  • R itself is reasonably cross-platform, but that means we need this to work across platforms and OSs
  • Encoding Depends: beyond R packages requires encoding the Depends in a portable manner across operating systems---good luck encoding even something simple such as 'a PNG graphics library' in a way that can be resolved unambiguously across systems
  • Windows does not have a package manager
  • AFAIK OS X does not have a package manager that mixes what Apple ships and what other Open Source projects provide
  • Even among Linux distributions, you do not get consistency: just take RStudio as an example which comes in two packages (which all provide their dependencies!) for RedHat/Fedora and Debian/Ubuntu

This is a hard problem.



回答2:

As a stop-gap, I've written a new rbundler package. It installs project dependencies into a project-specific subdirectory (e.g. <PROJECT>/.Rbundle), allowing the user to avoid using global libraries.

  • rbundler on Github
  • rbundler on CRAN

We've been using rbundler at Opower for a few months now and have seen a huge improvement in developer workflow, testability, and maintainability of internal packages. Combined with our internal package repository, we have been able to stabilize development of a dozen or so packages for use in production applications.

A common workflow:

  • Check out a project from github
  • cd into the project directory
  • Fire up R
  • From the R console:

    library(rbundler)

    bundle('.')

All dependencies will be installed into ./.Rbundle, and an .Renviron file will be created with the following contents:

R_LIBS_USER='.Rbundle'

Any R operations run from within this project directory will adhere to the project-speciic library and package dependencies. Note that, while this method uses the package DESCRIPTION to define dependencies, it needn't have an actual package structure. Thus, rbundler becomes a general tool for managing an R project, whether it be a simple script or a full-blown package.



回答3:

You could use the following workflow:

1) create a script file, which contains everything you want to setup and store it in your projectd directory as e.g. projectInit.R

2) source this script from your .Rprofile (or any other file executed by R at startup) with a try statement

try(source("./projectInit.R"), silent=TRUE)

This will guarantee that even when no projectInit.R is found, R starts without error message

3) if you start R in your project directory, the projectInit.R file will be sourced if present in the directory and you are ready to go

This is from a Linux perspective, but should work in the same way under windows and Mac as well.



回答4:

The packrat package is precisely meant to achieve the following:

install any required packages into a project-specific library without affecting the global R installation

It allows installing different versions of the same packages in different project-local package libraries.

I am adding this answer even though this question is 5 years old, because this solution apparently didn't exist yet at the time the question was asked (as far as I can tell, packrat first appeared on CRAN in 2014).