Unable to install R/rmr2 on AWS EMR

2019-09-02 04:16发布

问题:

Having spent around a week trying to install R and rmr2 on AWS-EMR, I am turning to you all for a little help. My bootstrap script is successfully installing R 2.14.1-1~lennycran.0 (thanks to JD Long's blog). When I am trying to install rmr2 I am having the classic dependency problem. Seems I have to install packages like Rcpp, RJSONIO, bitops, digest and 5 more. Because only an older Rcpp works with R 2.14.1, I am downloading a named version and installing it. How old, I don't know - I randomly tried a few versions and 0.8.9 worked. I will make a few more hit-and-trials.

sudo curl -o Rcpp.tar.gz http://cran.us.r-project.org/src/contrib/Archive/Rcpp/Rcpp_0.8.9.tar.gz
sudo R CMD INSTALL Rcpp.tar.gz

Now I am supposed to install the rest of the dependencies (How?)

And eventually rmr2 would be installed. I am using the following script, which, of course fails -

sudo wget --no-check-certificate -o rmr2.tar.qz -S -T 10 -t 5 http://goo.gl/dvBric
sudo R CMD INSTALL rmr2.tar.gz

My question is -

What should be a simple bootstrap script for installing the rest of the dependencies ("RJSONIO", "bitops", "digest", "functional", "stringr", "plyr", "reshape2", "caTools")? Do I have to worry about compatibility of those packages as well?

Here is my complete bootstrap.sh code -

#!/bin/bash

#debian R upgrade

gpg --keyserver pgpkeys.mit.edu --recv-key 06F90DE5381BA480
gpg -a --export 06F90DE5381BA480 | sudo apt-key add -
echo "deb http://streaming.stat.iastate.edu/CRAN/bin/linux/debian lenny-cran/" | sudo tee -a /etc/apt/sources.list
sudo apt-get update
sudo apt-get -t lenny-cran install --yes --force-yes r-base r-base-dev

sudo curl -o rmr2.tar.gz http://goo.gl/dvBric
sudo R CMD INSTALL rmr2.tar.gz <<<< Does not go beyond this.

set -e
bucket=muxxx-bisxxx-bucket
path=input.tar.gz
wget -S -T 10 -t 5 http://$bucket.s3.amazonaws.com/$path
mkdir -p /home/hadoop/contents
tar -C /home/hadoop/contents -xzf input.tar.gz

export HADOOP_CMD=/home/hadoop/bin/hadoop
export HADOOP_STREAMING=/home/hadoop/contrib/streaming/hadoop_streaming.jar

/home/hadoop/bin/hadoop fs -mkdir /home/hadoop/contents
/home/hadoop/bin/hadoop fs -put /home/hadoop/contents/* /home/hadoop/contents/

回答1:

I have not resolved my problem on hand but I got a direction. I added the following line of code in the bootstrap script after R 2.14.1 installation and before rmr2 installation -

sudo Rscript -e 'install.packages(c("rJava", "Rcpp", "RJSONIO", "bitops", "digest", "functional", "stringr", "plyr", "reshape2", "caTools"), repos="http://ftp.heanet.ie/mirrors/cran.r-project.org/")'

Currently the bootstrapping process breaks down at plyr, which I guess, is due to older version of Rcpp that I have.

I am closing this post.