I'm using cassandra-all 2.0.7 api with hadoop 2.2.0.
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>zazzercode</groupId>
<artifactId>doctorhere-engine-writer</artifactId>
<version>1.0</version>
<packaging>jar</packaging>
<name>DoctorhereEngineWriter</name>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<cassandra.version>2.0.7</cassandra.version>
<hector.version>1.0-2</hector.version>
<guava.version>15.0</guava.version>
<hadoop.version>2.2.0</hadoop.version>
</properties>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.3.2</version>
<configuration>
<source>1.6</source>
<target>1.6</target>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<archive>
<manifest>
<mainClass>zazzercode.DiseaseCountJob</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>me.prettyprint</groupId>
<artifactId>hector-core</artifactId>
<version>${hector.version}</version>
<exclusions>
<exclusion>
<artifactId>org.apache.thrift</artifactId>
<groupId>libthrift</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.cassandra</groupId>
<artifactId>cassandra-all</artifactId>
<version>${cassandra.version}</version>
<exclusions>
<exclusion>
<artifactId>libthrift</artifactId>
<groupId>org.apache.thrift</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.cassandra</groupId>
<artifactId>cassandra-thrift</artifactId>
<version>${cassandra.version}</version>
<exclusions>
<exclusion>
<artifactId>libthrift</artifactId>
<groupId>org.apache.thrift</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.thrift</groupId>
<artifactId>libthrift</artifactId>
<version>0.7.0</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>${guava.version}</version>
</dependency>
<dependency>
<groupId>com.googlecode.concurrentlinkedhashmap</groupId>
<artifactId>concurrentlinkedhashmap-lru</artifactId>
<version>1.3</version>
</dependency>
</dependencies>
</project>
When I start the jar (created after mvn assembly:assembly
from normal user prayagupd
) as below from hduser
,
hduser@prayagupd$ hadoop jar target/doctorhere-engine-writer-1.0-jar-with-dependencies.jar /user/hduser/shakespeare
I get following guava collection error on cassandra api,
14/11/23 17:51:04 WARN mapred.LocalJobRunner: job_local800673408_0001
java.lang.NoSuchMethodError: com.google.common.collect.Sets.newConcurrentHashSet()Ljava/util/Set;
at org.apache.cassandra.config.Config.<init>(Config.java:53)
at org.apache.cassandra.config.DatabaseDescriptor.<clinit>(DatabaseDescriptor.java:105)
at org.apache.cassandra.hadoop.BulkRecordWriter.<init>(BulkRecordWriter.java:105)
at org.apache.cassandra.hadoop.BulkRecordWriter.<init>(BulkRecordWriter.java:90)
at org.apache.cassandra.hadoop.BulkOutputFormat.getRecordWriter(BulkOutputFormat.java:69)
at org.apache.cassandra.hadoop.BulkOutputFormat.getRecordWriter(BulkOutputFormat.java:29)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:558)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:632)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:405)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:445)
14/11/23 17:51:04 INFO mapreduce.Job: map 100% reduce 0%
Line#53 of cassandra api's Config.java has this code,
public Set<String> hinted_handoff_enabled_by_dc = Sets.newConcurrentHashSet();
Whereas, I find Sets class with the jar itself,
hduser@prayagupd$ jar tvf target/doctorhere-engine-writer-1.0-jar-with-dependencies.jar | grep com/google/common/collect/Sets
2358 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$1.class
2019 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$2.class
1705 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$3.class
1327 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$CartesianSet$1.class
4224 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$CartesianSet.class
5677 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$DescendingSet.class
4187 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$FilteredNavigableSet.class
1567 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$FilteredSet.class
2614 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$FilteredSortedSet.class
1174 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$ImprovedAbstractSet.class
1361 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$PowerSet$1.class
3727 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$PowerSet.class
1398 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$SetView.class
1950 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$SubSet$1.class
2058 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$SubSet.class
4159 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$UnmodifiableNavigableSet.class
17349 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets.class
Also, there exists the method when I checked the jar as below,
hduser@prayagupd$ javap -classpath target/doctorhere-engine-writer-1.0-jar-with-dependencies.jar com.google.common.collect.Sets | grep newConcurrentHashSet
public static <E extends java/lang/Object> java.util.Set<E> newConcurrentHashSet();
public static <E extends java/lang/Object> java.util.Set<E> newConcurrentHashSet(java.lang.Iterable<? extends E>);
I see com.google.guava
under /META/INF/maven
library when I navigate the jar file,
I have following artifacts in ~/.m2
from outside of hdfs user,
$ ll ~/.m2/repository/com/google/guava/guava
total 20
drwxrwxr-x 5 prayagupd prayagupd 4096 Nov 23 20:05 ./
drwxrwxr-x 4 prayagupd prayagupd 4096 Nov 23 20:05 ../
drwxrwxr-x 2 prayagupd prayagupd 4096 Nov 23 20:05 11.0.2/
drwxrwxr-x 2 prayagupd prayagupd 4096 Nov 23 20:06 15.0/
drwxrwxr-x 2 prayagupd prayagupd 4096 Nov 23 20:05 r09/
And hadoop classpath is
$ hadoop classpath
/usr/local/hadoop-2.2.0/etc/hadoop:
/usr/local/hadoop2.2.0/share/hadoop/common/lib/*:
/usr/local/hadoop-2.2.0/share/hadoop/common/*:
/usr/local/hadoop-2.2.0/share/hadoop/hdfs:
/usr/local/hadoop-2.2.0/share/hadoop/hdfs/lib/*:
/usr/local/hadoop-2.2.0/share/hadoop/hdfs/*:
/usr/local/hadoop-2.2.0/share/hadoop/yarn/lib/*:
/usr/local/hadoop-2.2.0/share/hadoop/yarn/*:
/usr/local/hadoop-2.2.0/share/hadoop/mapreduce/lib/*:
/usr/local/hadoop-2.2.0/share/hadoop/mapreduce/*:
/usr/local/hadoop-2.2.0/contrib/capacity-scheduler/*.jar
dependency tree is as below where com.google.guava:guava:jar:r09:compile
is used by me.prettyprint:hector-core:jar:1.0-2:compile
, whereas guava-11.0.2.jar
is used by hadoop-2.2.0
or hadoop-2.6.0
and cassandra-2.0.6
uses guava-15.0..jar
$ find /usr/local/apache-cassandra-2.0.6/ -name "guava*"
/usr/local/apache-cassandra-2.0.6/lib/guava-15.0.jar
/usr/local/apache-cassandra-2.0.6/lib/licenses/guava-15.0.txt
$ mvn dependency:tree
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building DoctorhereEngineWriter 1.0
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-dependency-plugin:2.1:tree (default-cli) @ doctorhere-engine-writer ---
[INFO] zazzercode:doctorhere-engine-writer:jar:1.0
[INFO] +- junit:junit:jar:3.8.1:test (scope not updated to compile)
[INFO] +- me.prettyprint:hector-core:jar:1.0-2:compile
[INFO] | +- commons-lang:commons-lang:jar:2.4:compile
[INFO] | +- commons-pool:commons-pool:jar:1.5.3:compile
[INFO] | +- com.google.guava:guava:jar:r09:compile
[INFO] | +- org.slf4j:slf4j-api:jar:1.6.1:compile
[INFO] | +- com.github.stephenc.eaio-uuid:uuid:jar:3.2.0:compile
[INFO] | \- com.ecyrd.speed4j:speed4j:jar:0.9:compile
[INFO] +- org.apache.cassandra:cassandra-all:jar:2.0.7:compile
[INFO] | +- org.xerial.snappy:snappy-java:jar:1.0.5:compile
[INFO] | +- net.jpountz.lz4:lz4:jar:1.2.0:compile
[INFO] | +- com.ning:compress-lzf:jar:0.8.4:compile
[INFO] | +- commons-cli:commons-cli:jar:1.1:compile
[INFO] | +- commons-codec:commons-codec:jar:1.2:compile
[INFO] | +- org.apache.commons:commons-lang3:jar:3.1:compile
[INFO] | +- com.googlecode.concurrentlinkedhashmap:concurrentlinkedhashmap-lru:jar:1.3:compile
[INFO] | +- org.antlr:antlr:jar:3.2:compile
[INFO] | | \- org.antlr:antlr-runtime:jar:3.2:compile
[INFO] | | \- org.antlr:stringtemplate:jar:3.2:compile
[INFO] | | \- antlr:antlr:jar:2.7.7:compile
[INFO] | +- org.codehaus.jackson:jackson-core-asl:jar:1.9.2:compile
[INFO] | +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.2:compile
[INFO] | +- jline:jline:jar:1.0:compile
[INFO] | +- com.googlecode.json-simple:json-simple:jar:1.1:compile
[INFO] | +- com.github.stephenc.high-scale-lib:high-scale-lib:jar:1.1.2:compile
[INFO] | +- org.yaml:snakeyaml:jar:1.11:compile
[INFO] | +- edu.stanford.ppl:snaptree:jar:0.1:compile
[INFO] | +- org.mindrot:jbcrypt:jar:0.3m:compile
[INFO] | +- com.yammer.metrics:metrics-core:jar:2.2.0:compile
[INFO] | +- com.addthis.metrics:reporter-config:jar:2.1.0:compile
[INFO] | | \- org.hibernate:hibernate-validator:jar:4.3.0.Final:compile
[INFO] | | +- javax.validation:validation-api:jar:1.0.0.GA:compile
[INFO] | | \- org.jboss.logging:jboss-logging:jar:3.1.0.CR2:compile
[INFO] | +- com.thinkaurelius.thrift:thrift-server:jar:0.3.3:compile
[INFO] | | \- com.lmax:disruptor:jar:3.0.1:compile
[INFO] | +- net.sf.supercsv:super-csv:jar:2.1.0:compile
[INFO] | +- log4j:log4j:jar:1.2.16:compile
[INFO] | +- com.github.stephenc:jamm:jar:0.2.5:compile
[INFO] | \- io.netty:netty:jar:3.6.6.Final:compile
[INFO] +- org.apache.cassandra:cassandra-thrift:jar:2.0.7:compile
[INFO] +- org.apache.hadoop:hadoop-client:jar:2.2.0:compile
[INFO] | +- org.apache.hadoop:hadoop-common:jar:2.2.0:compile
[INFO] | | +- org.apache.commons:commons-math:jar:2.1:compile
[INFO] | | +- xmlenc:xmlenc:jar:0.52:compile
[INFO] | | +- commons-httpclient:commons-httpclient:jar:3.1:compile
[INFO] | | +- commons-io:commons-io:jar:2.1:compile
[INFO] | | +- commons-net:commons-net:jar:3.1:compile
[INFO] | | +- commons-logging:commons-logging:jar:1.1.1:compile
[INFO] | | +- commons-configuration:commons-configuration:jar:1.6:compile
[INFO] | | | +- commons-collections:commons-collections:jar:3.2.1:compile
[INFO] | | | +- commons-digester:commons-digester:jar:1.8:compile
[INFO] | | | | \- commons-beanutils:commons-beanutils:jar:1.7.0:compile
[INFO] | | | \- commons-beanutils:commons-beanutils-core:jar:1.8.0:compile
[INFO] | | +- org.apache.avro:avro:jar:1.7.4:compile
[INFO] | | | \- com.thoughtworks.paranamer:paranamer:jar:2.3:compile
[INFO] | | +- com.google.protobuf:protobuf-java:jar:2.5.0:compile
[INFO] | | +- org.apache.hadoop:hadoop-auth:jar:2.2.0:compile
[INFO] | | +- org.apache.zookeeper:zookeeper:jar:3.4.5:compile
[INFO] | | \- org.apache.commons:commons-compress:jar:1.4.1:compile
[INFO] | | \- org.tukaani:xz:jar:1.0:compile
[INFO] | +- org.apache.hadoop:hadoop-hdfs:jar:2.2.0:compile
[INFO] | | \- org.mortbay.jetty:jetty-util:jar:6.1.26:compile
[INFO] | +- org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.2.0:compile
[INFO] | | +- org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.2.0:compile
[INFO] | | | +- org.apache.hadoop:hadoop-yarn-client:jar:2.2.0:compile
[INFO] | | | \- org.apache.hadoop:hadoop-yarn-server-common:jar:2.2.0:compile
[INFO] | | \- org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.2.0:compile
[INFO] | +- org.apache.hadoop:hadoop-yarn-api:jar:2.2.0:compile
[INFO] | +- org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.2.0:compile
[INFO] | | \- org.apache.hadoop:hadoop-yarn-common:jar:2.2.0:compile
[INFO] | +- org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.2.0:compile
[INFO] | \- org.apache.hadoop:hadoop-annotations:jar:2.2.0:compile
[INFO] \- org.apache.thrift:libthrift:jar:0.7.0:compile
[INFO] +- javax.servlet:servlet-api:jar:2.5:compile
[INFO] \- org.apache.httpcomponents:httpclient:jar:4.0.1:compile
[INFO] \- org.apache.httpcomponents:httpcore:jar:4.0.1:compile
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 27.124s
[INFO] Finished at: Wed Mar 18 01:39:42 CDT 2015
[INFO] Final Memory: 15M/982M
[INFO] ------------------------------------------------------------------------
Here's hadoop script for hadoop 2.2.0
,
$ cat /usr/local/hadoop-2.2.0/bin/hadoop
#!/usr/bin/env bash
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This script runs the hadoop core commands.
bin=`which $0`
bin=`dirname ${bin}`
bin=`cd "$bin"; pwd`
DEFAULT_LIBEXEC_DIR="$bin"/../libexec
HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
. $HADOOP_LIBEXEC_DIR/hadoop-config.sh
export HADOOP_USER_CLASSPATH_FIRST=true
function print_usage(){
echo "Usage: hadoop [--config confdir] COMMAND"
echo " where COMMAND is one of:"
echo " fs run a generic filesystem user client"
echo " version print the version"
echo " jar <jar> run a jar file"
echo " checknative [-a|-h] check native hadoop and compression libraries availability"
echo " distcp <srcurl> <desturl> copy file or directories recursively"
echo " archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive"
echo " classpath prints the class path needed to get the"
echo " Hadoop jar and the required libraries"
echo " daemonlog get/set the log level for each daemon"
echo " or"
echo " CLASSNAME run the class named CLASSNAME"
echo ""
echo "Most commands print help when invoked w/o parameters."
}
if [ $# = 0 ]; then
print_usage
exit
fi
COMMAND=$1
case $COMMAND in
# usage flags
--help|-help|-h)
print_usage
exit
;;
#hdfs commands
namenode|secondarynamenode|datanode|dfs|dfsadmin|fsck|balancer|fetchdt|oiv|dfsgroups|portmap|nfs3)
echo "DEPRECATED: Use of this script to execute hdfs command is deprecated." 1>&2
echo "Instead use the hdfs command for it." 1>&2
echo "" 1>&2
#try to locate hdfs and if present, delegate to it.
shift
if [ -f "${HADOOP_HDFS_HOME}"/bin/hdfs ]; then
exec "${HADOOP_HDFS_HOME}"/bin/hdfs ${COMMAND/dfsgroups/groups} "$@"
elif [ -f "${HADOOP_PREFIX}"/bin/hdfs ]; then
exec "${HADOOP_PREFIX}"/bin/hdfs ${COMMAND/dfsgroups/groups} "$@"
else
echo "HADOOP_HDFS_HOME not found!"
exit 1
fi
;;
#mapred commands for backwards compatibility
pipes|job|queue|mrgroups|mradmin|jobtracker|tasktracker)
echo "DEPRECATED: Use of this script to execute mapred command is deprecated." 1>&2
echo "Instead use the mapred command for it." 1>&2
echo "" 1>&2
#try to locate mapred and if present, delegate to it.
shift
if [ -f "${HADOOP_MAPRED_HOME}"/bin/mapred ]; then
exec "${HADOOP_MAPRED_HOME}"/bin/mapred ${COMMAND/mrgroups/groups} "$@"
elif [ -f "${HADOOP_PREFIX}"/bin/mapred ]; then
exec "${HADOOP_PREFIX}"/bin/mapred ${COMMAND/mrgroups/groups} "$@"
else
echo "HADOOP_MAPRED_HOME not found!"
exit 1
fi
;;
classpath)
echo $CLASSPATH
exit
;;
#core commands
*)
# the core commands
if [ "$COMMAND" = "fs" ] ; then
CLASS=org.apache.hadoop.fs.FsShell
elif [ "$COMMAND" = "version" ] ; then
CLASS=org.apache.hadoop.util.VersionInfo
elif [ "$COMMAND" = "jar" ] ; then
CLASS=org.apache.hadoop.util.RunJar
elif [ "$COMMAND" = "checknative" ] ; then
CLASS=org.apache.hadoop.util.NativeLibraryChecker
elif [ "$COMMAND" = "distcp" ] ; then
CLASS=org.apache.hadoop.tools.DistCp
CLASSPATH=${CLASSPATH}:${TOOL_PATH}
elif [ "$COMMAND" = "daemonlog" ] ; then
CLASS=org.apache.hadoop.log.LogLevel
elif [ "$COMMAND" = "archive" ] ; then
CLASS=org.apache.hadoop.tools.HadoopArchives
CLASSPATH=${CLASSPATH}:${TOOL_PATH}
elif [[ "$COMMAND" = -* ]] ; then
# class and package names cannot begin with a -
echo "Error: No command named \`$COMMAND' was found. Perhaps you meant \`hadoop ${COMMAND#-}'"
exit 1
else
CLASS=$COMMAND
fi
shift
# Always respect HADOOP_OPTS and HADOOP_CLIENT_OPTS
HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
#make sure security appender is turned off
HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,NullAppender}"
export CLASSPATH=$CLASSPATH
exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
;;
esac
How could this google-collection issue be fixed?
Actual code here
git clone --branch doctor-engine-writer https://github.com/prayagupd/doctorhere
cd doctorhere/doctorhere-engine-writer
References
Hadoop library conflict at mapreduce time