I have an issue with installing the below R-packages and reference them in an R-script I have encapsulated in a U-SQL-script. I succeeded in running a simple R-script in a U-SQL-job that required no special packages. Now I am trying to create an R-script that references dplyr, tdyr and reshape2. Therefore I have downloaded these three packages manually as both .zip and .tar.gz-files and uploaded them to my ADL-account. Example:
../usqlext/samples/R/dplyr_0.7.7.zip
The U-SQL startes like this:
REFERENCE ASSEMBLY [ExtR]; //enable R extensions for the U-SQL Script
DEPLOY RESOURCE @"/usqlext/samples/R/dplyr_0.7.7.zip";
DEPLOY RESOURCE @"/usqlext/samples/R/reshape2_1.4.3.zip";
DEPLOY RESOURCE @"/usqlext/samples/R/tidyr_0.8.1.zip";
The R-script starts like this:
// declare the R script as a string variable and pass it as a parameter to the Reducer:
DECLARE @myRScript = @"
install.packages('dplyr_0.7.7.zip', repos = NULL) # installing package
unzip('dplyr_0.7.7.zip')
require(dplyr)
install.packages('tidyr_0.8.1.zip', repos = NULL) # installing package
unzip('tidyr_0.8.1.zip')
require(tidyr)
install.packages('reshape2_1.4.3.zip', repos = NULL) # installing package
unzip('reshape2_1.4.3.zip')
require(reshape2)
However I keep getting errors that indicate to me that the packages are still not successfully installed. Currently I get the following error message:
Unhandled exception from user code: "Error in function_list[[i]](value) : could not find function "group_by"
That error comes from the following piece of R-code:
longStandardized <- dataset %>%
group_by(InstallationId) %>%
mutate(stdConsumption = znorm(tmp)) %>%
select(InstallationId, Hournumber, stdConsumption)
Hope that someone can see what I am missing.
Thanks
Jon
The easy way to do it, its download the file on datalake in directory:
usqlext\assembly\R\MRS.9.1.0.zip
Them you unzip the file (on a machine without R installed) and execute R.exe on bin folder.
Now you can install all packages you want (with parameter dependencies = true)
install.packages('yourpackage', dependencies = TRUE)
Zip the folder again and replace the file on datalake by this you created.
Execute RegisterAllAssemblies.USQL
again, and your package will be available for you!
library('yourpackage')
If get not find package error, you need this trick:
libpath = .libPaths()[1]
install.packages('yourpackage', lib = libpath)
The answer from "Jorge Ribeiro" works very well. But there are possibilities that, even after following the steps you may end up having an error as - .
Unhandled exception from user code: "Specified directory not found:
'D:\5827d493\bin\x64'". The details includes more information
including any inner exceptions and the stack trace where the exception
was raised.
In this scenario, following steps would resolve the issue. -
- Download only the
/usqlext/assembly/R/MRS.9.1.0.zip
from azure datalake to
local machine.
- extract all the files the zipped (MRS.9.1.0.zip) file onto same location and execute R.exe on bin folder.
Note: It does not matter if you download and unzip MRS.9.1.0.zip file onto machine with or without prior "R Installation". You can download & unzip on any machine.
- Install all packages you want (with parameter dependencies = true &
lib = libpath).
libpath = "Path of the library folder under extracted files (i.e
unzipped library folder name)
install.packages('yourpackage', dependencies = TRUE, lib = libpath)
- Select All the files
(control + A)
and Zip them in the same location/folder again.
Note : Always do select all and zip them back into same folder &
rename the zipped file (if needed as in name mentioned n RegsiterAll Assembly file).Otherwise, you'll always end up with error I mentioned above.
- Upload and Replace the file on datalake by the one you've just now created.
- Execute
RegisterAllAssemblies.USQL
and your libraries are available for use afterwards.