Installing R-packages in Azure Data Lake Analytics

2019-08-15 17:52发布

问题:

I have an issue with installing the below R-packages and reference them in an R-script I have encapsulated in a U-SQL-script. I succeeded in running a simple R-script in a U-SQL-job that required no special packages. Now I am trying to create an R-script that references dplyr, tdyr and reshape2. Therefore I have downloaded these three packages manually as both .zip and .tar.gz-files and uploaded them to my ADL-account. Example:

../usqlext/samples/R/dplyr_0.7.7.zip

The U-SQL startes like this:

REFERENCE ASSEMBLY [ExtR];   //enable R extensions for the U-SQL Script

DEPLOY RESOURCE @"/usqlext/samples/R/dplyr_0.7.7.zip";
DEPLOY RESOURCE @"/usqlext/samples/R/reshape2_1.4.3.zip";
DEPLOY RESOURCE @"/usqlext/samples/R/tidyr_0.8.1.zip";

The R-script starts like this:

// declare the R script as a string variable and pass it as a parameter to the Reducer:
DECLARE @myRScript = @"
install.packages('dplyr_0.7.7.zip', repos = NULL) # installing package
unzip('dplyr_0.7.7.zip')
require(dplyr)

install.packages('tidyr_0.8.1.zip', repos = NULL) # installing package
unzip('tidyr_0.8.1.zip')
require(tidyr)

install.packages('reshape2_1.4.3.zip', repos = NULL) # installing package
unzip('reshape2_1.4.3.zip')
require(reshape2)

However I keep getting errors that indicate to me that the packages are still not successfully installed. Currently I get the following error message:

Unhandled exception from user code: "Error in function_list[[i]](value) : could not find function "group_by"

That error comes from the following piece of R-code:

longStandardized <- dataset %>%
    group_by(InstallationId) %>%
    mutate(stdConsumption = znorm(tmp)) %>%
    select(InstallationId, Hournumber, stdConsumption)

Hope that someone can see what I am missing.

Thanks Jon

回答1:

The easy way to do it, its download the file on datalake in directory: usqlext\assembly\R\MRS.9.1.0.zip

Them you unzip the file (on a machine without R installed) and execute R.exe on bin folder.

Now you can install all packages you want (with parameter dependencies = true)

install.packages('yourpackage', dependencies = TRUE)

Zip the folder again and replace the file on datalake by this you created.

Execute RegisterAllAssemblies.USQL again, and your package will be available for you!

library('yourpackage')

If get not find package error, you need this trick:

libpath = .libPaths()[1]
install.packages('yourpackage', lib = libpath)


回答2:

The answer from "Jorge Ribeiro" works very well. But there are possibilities that, even after following the steps you may end up having an error as - .

 Unhandled exception from user code: "Specified directory not found:
 'D:\5827d493\bin\x64'". The details includes more information
 including any inner exceptions and the stack trace where the exception
 was raised.

In this scenario, following steps would resolve the issue. -

  1. Download only the

    /usqlext/assembly/R/MRS.9.1.0.zip

from azure datalake to local machine.

  1. extract all the files the zipped (MRS.9.1.0.zip) file onto same location and execute R.exe on bin folder. Note: It does not matter if you download and unzip MRS.9.1.0.zip file onto machine with or without prior "R Installation". You can download & unzip on any machine.
  2. Install all packages you want (with parameter dependencies = true & lib = libpath).

libpath = "Path of the library folder under extracted files (i.e unzipped library folder name)

install.packages('yourpackage', dependencies = TRUE, lib = libpath)

  1. Select All the files (control + A) and Zip them in the same location/folder again.

Note : Always do select all and zip them back into same folder & rename the zipped file (if needed as in name mentioned n RegsiterAll Assembly file).Otherwise, you'll always end up with error I mentioned above.

  1. Upload and Replace the file on datalake by the one you've just now created.
  2. Execute RegisterAllAssemblies.USQL and your libraries are available for use afterwards.


标签: r u-sql