There are similar questions on Stack overflow but none of them answer the question. The problem arises when as per the following link http://grepalex.com/2013/02/25/hadoop-libjars/ ,we need to use export HADOOP_CLASSPATH=/path/jar1:/path/jar2 to get it to work. So how can I execute export HADOOP_CLASSPATH=/path/jar1:/path/jar2 for -libjars option to work.
I have implemented a Tool Runner . It works perfectly on hadoop and HDFS.
I tried executing this while using custom jar but it gives Exception java.lang.NoClassDefFoundError: org/json/simple/parser/JSONParser
:
This is what I ran in EMR where I am using MultipleInputs and a file to parse so you can see multiple paths as arguments this works while running in hadoop.
Alert -libjars s3n://akshayhazari/jars/json-simple-1.1.1.jar -D mapred.output.compress=true -D mapred.output.compression.type=BLOCK -D io.seqfile.compression.type=BLOCK -D mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec s3n://akshayhazari/rule/rule1.json s3n://akshayhazari/Alert/input/data.txt.gz s3n://akshayhazari/Alert/input/data1.txt.gz s3n://akshayhazari/Alert/output
Any help is appreciated.
Here's what I did to add to Sandesh's answer to build a jar. Then run ant build-jar
Then after specifying path to fatjar.jar in EMR , used the following as arguments.
Can you try creating FatJar and Run. Try to create one jar with dependency added and then Run with EMR. It will work.
in ant build you can use as below
< zip destfile="/lib/abc-fatjar.jar" >
< zipgroupfileset dir="lib" includes="jobcustomjar.jar,json-simple-1.1.1.jar" />
< /zip >
For a Hadoop Streaming job where you can't bundle your code into one big Jar, you can use the following trick (In my case, I created my own Java classes for custom input and output formats. For custom splitters or whatever else, this same trick would apply):
Create a Jar containing your custom classes
Upload the Jar to S3:
Create a shell script that fetches the Jar and copies it to the Master node:
Upload the shell script to S3:
When creating your EMR job, run your jar-fetcher script before your streaming job: