I am sifting through a package and scripts that utilize the package, and would like to identify external dependencies. The goal is to modify scripts to specify library(pkgName)
and to modify functions in the package to use require(pkgName)
, so that these dependencies will be more obvious later.
I am revising the code to account for each externally dependent package. As an example, though it is by no means definitive, I am now finding it difficult to identify code that depends on data.table
. I could replace data.table
with Matrix
, ggplot2
, bigmemory
, plyr
, or many other packages, so feel free to answer with examples based on other packages.
This search isn't particularly easy. The approaches I have tried so far include:
- Search the code for
library
andrequire
statements - Search for mentions of
data.table
(e.g.library(data.table)
) - Try running
codetools::checkUsage
to determine where there may be some issues. For the scripts, my program inserts the script into a local function and appliescheckUsage
to that function. Otherwise, I usecheckUsagePackage
for the package. - Look for statements that are somewhat unique to
data.table
, such as:=
. - Look for where objects' classes may be identified via Hungarian notation, such as
DT
The essence of my searching is to find:
- loading of
data.table
, - objects with names that indicate they are
data.table
objects, - methods that appear to be
data.table
-specific
The only easy part of this seems to be finding where the package is loaded. Unfortunately, not all functions may explicitly load or require the external package - these may assume it has already been loaded. This is a bad practice, and I am trying to fix it. However, searching for objects and methods seems to be challenging.
This (data.table
) is just one package, and one with what seems to be limited and somewhat unique usage. Suppose I wanted to look for uses of ggplot functions, where the options are more extensive, and the text of the syntax is not as idiosyncratic (i.e. frequent usage of +
is not idiosyncratic, while :=
seems to be).
I don't think that static analysis will give a perfect answer, e.g. one could pass an argument to a function, which specifies a package to be loaded. Nonetheless: are there any core tools or packages that can improve on this brute force approach, either via static or dynamic analysis?
For what it's worth, tools::pkgDepends
only addresses dependencies at the package level, not the function or script level, which is the level I'm working at.
Update 1: An example of a dynamic analysis tool that should work is one that reports which packages are loaded during code execution. I don't know if such a capability exists in R, though - it would be like Rprof
reporting the output of search()
instead of the code stack.
First, thanks to @mathematical.coffee to putting me on the path of using Mark Bravington's
mvbutils
package. Thefoodweb
function is more than satisfactory.To recap, I wanted to know about about checking one package, say
myPackage
versus another, sayexternalPackage
, and about checking scripts against theexternalPackage
. I'll demonstrate how to do each. In this case, the external package isdata.table
.1: For
myPackage
versusdata.table
, the following commands suffice:This produces an excellent graph showing which functions depend on functions in
data.table
. Although the graph includes dependencies withindata.table
, it's not overly burdensome: I can easily see which of my functions depend ondata.table
, and which functions they use, such asas.data.table
,data.table
,:=
,key
, and so on. At this point, one could say the package dependency problem is solved, butfoodweb
offers so much more, so let's look at that. The cool part is the dependency matrix.This is cool: it now shows dependencies of functions in my package, where I'm using verbose names, e.g.
myPackage.cleanData
, on functions not in my package, namely functions indata.table
, and it eliminates rows and columns where there are no dependencies. This is concise, lets me survey dependencies quickly, and I can find the complementary set for my functions quite easily, too, by processingrownames(depMat)
.NB:
plotting = FALSE
doesn't seem to prevent a plotting device from being created, at least the first time thatfoodweb
is called in a sequence of calls. That is annoying, but not terrible. Maybe I'm doing something wrong.2: For scripts versus
data.table
, this gets a little more interesting. For each script, I need to create a temporary function, and then check for dependencies. I have a little function below that does precisely that.Now, I just need to look at
listDeps
, and I have the same kind of wonderful little insights that I have from the depMat above. I modifiedcheckScriptDependencies
from other code that I wrote that sends scripts to be analyzed bycodetools::checkUsage
; it's good to have a little function like this around for analyzing standalone code. Kudos to @Spacedman and @Tommy for insights that improved the call tofoodweb
, usingenvironment()
.(True hungaRians will notice that I was inconsistent with the order of name and type - tooBad. :) There's a longer reason for this, but this isn't precisely the code I'm using, anyway.)
Although I've not posted pictures of the graphs produced by
foodweb
for my code, you can see some nice examples at http://web.archive.org/web/20120413190726/http://www.sigmafield.org/2010/09/21/r-function-of-the-day-foodweb. In my case, its output definitely captures data.table's usage of:=
andJ
, along with the standard named functions, likekey
andas.data.table
. It seems to obviate my text searches and is an improvement in several ways (e.g. finding functions that I'd overlooked).All in all,
foodweb
is an excellent tool, and I encourage others to explore themvbutils
package and some of Mark Bravington's other nice packages, such asdebug
. If you do installmvbutils
, just check out?changed.funs
if you think that only you struggle with managing evolving R code. :)