Trying out Cloudera Spark Tutorial won't work

2019-09-15 18:14发布

问题:

I tried solutions suggested in similar existing post but none works for me :-( getting really hopeless, so I decided to post this as a new question.

I tried a tutorial (link below) on building a first scala or java application with Spark in a Cloudera VM.

this is my spark-submit command and its output

[cloudera@quickstart sparkwordcount]$ spark-submit --class com.cloudera.sparkwordcount.SparkWordCount --master local  /home/cloudera/src/main/scala/com/cloudera/sparkwordcount/target/sparkwordcount-0.0.1-SNAPSHOT.jar
java.lang.ClassNotFoundException: com.cloudera.sparkwordcount.SparkWordCount
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:270)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:176)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:689)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
[cloudera@quickstart sparkwordcount]$ spark-submit --class com.cloudera.sparkwordcount.SparkWordCount --master local  /home/cloudera/src/main/scala/com/cloudera/sparkwordcount/target/sparkwordcount-0.0.1-SNAPSHOT.jar

I also tried updating the pom.xml file with my actual CDH, Spark and Scala versions but still not working.

When I extract the jar file previously generated by maven using mvn package, I cannot find any .class file inside its hiearachy of folders.

Sorry, I am bit new to Cloudera and Spark. I basically tried following the following tutorial with Scala: https://blog.cloudera.com/blog/2014/04/how-to-run-a-simple-apache-spark-app-in-cdh-5/

I checked the class, folder and scala file names quite a few names very closely, specially lower/uppercase issues, nothing seemed wrong.

I opened my jar and there is some file hierarchy and in the deepest folder I can find again the pom.xml file, but I cannot see any .class files anywhere inside the jar. Does it mean the compilation via "mvn package" didn't actually work, even though the console output said Building went successful?

回答1:

I was having same issue. Try rerunning by changing class name from

--class com.cloudera.sparkwordcount.SparkWordCount

to

--class SparkWordCount

The full command i used looked like:

spark-submit --class SparkWordCount --master local --deploy-mode client --executor-memory 1g --name wordcount --conf "spark.app.id=wordcount" target/sparkwordcount-0.0.1-SNAPSHOT.jar /user/cloudera/inputfile.txt 2