I am trying to run a simple rmr job using Rhadoop package but it is not working.Here is my R script
print("Initializing variable.....")
Sys.setenv(HADOOP_HOME="/usr/hdp/2.2.4.2-2/hadoop")
Sys.setenv(HADOOP_CMD="/usr/hdp/2.2.4.2-2/hadoop/bin/hadoop")
print("Invoking functions.......")
#Referece taken from Revolution Analytics
wordcount = function( input, output = NULL, pattern = " ")
{
mapreduce(
input = input ,
output = output,
input.format = "text",
map = wc.map,
reduce = wc.reduce,
combine = T)
}
wc.map =
function(., lines) {
keyval(
unlist(
strsplit(
x = lines,
split = pattern)),
1)}
wc.reduce =
function(word, counts ) {
keyval(word, sum(counts))}
#Function Invoke
wordcount('/user/hduser/rmr/wcinput.txt')
I am running above script as
Rscript wordcount.r
I am getting below error.
[1] "Initializing variable....."
[1] "Invoking functions......."
Error in wordcount("/user/hduser/rmr/wcinput.txt") :
could not find function "mapreduce"
Execution halted
Kindly let me know what is the issue.
Firstly, you'll have to set the
HADOOP_STREAMING
environment variable in your code.Try the below code, and note that the code assumes that you have copied your text file to the
hdfs
folderexamples/wordcount/data
R Code:
Output:
For your reference, here is another example of running R word count map reduce program.
Hope this helps.