I am new to zeppelin and trying to setup the zeppelin on my system. Till now I have done the following steps:
- Downloaded zeppelin from here
- Setup the JAVA_HOME at my system environment variable.
- Goto zeppelin-0.7.3-bin-all\bin and ran zeppelin.cmd
- Able to see zeppelin-ui at http://localhost:8090
When I am trying to run load data into table
program mentioned in zeppelin tutotial -> Basic Features(spark) it is throwing following error
java.lang.NullPointerException
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:38)
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:33)
at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext_2(SparkInterpreter.java:398)
at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:387)
at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:146)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:843)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:491)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
The above mentioned error log i am able to see in output inside zeppelin-ui at the same time my console (cmd) shows following error:
DEBUG [2018-01-11 10:55:30,059] ({Exec Stream Pumper} RemoteInterpreterManagedProcess.java[processLine]:206) - DEBUG [2018-01-11 10:55:30,059] ({pool-1-thread-3} Interpreter.java[getProperty]:165) - key: zeppelin.spark.concurrentSQL, value: false
WARN [2018-01-11 10:55:30,061] ({pool-2-thread-2} NotebookServer.java[afterStatusChange]:2064) - Job 20150210-015259_1403135953 is finished, status: ERROR, exception: null, result: %text java.lang.NullPointerException
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:38)
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:33)
at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext_2(SparkInterpreter.java:398)
at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:387)
at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:146)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:843)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:491)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Is there anything that I am missing or anything to do with spark? Because I am assuming zeppelin will take care of spark and we do not have to setup the spark. Thanks in advance for helping me!
I had the same issue with Zeppelin 0.8 on HDP 3.1 with spark 2.3 and I think the issue was in overriding libraries, which are already present in spark interepreter.
The code is working with Kafka DirectStream, but it is not important for this issue.
See, what I have commented and it worked.
First i would ensure that spark is generally running.
Have you tried to run a new "spark notebook".
It tries to initialize a sparkcontext. When it works we are good and we need to check the options an syntax which is used in the paragraph you are using.
If that is not running i would check/set the zeppelin-env.cmd with the following variables:
If you didnt setup spark on windows yet it can not work.
download the spark version you like to use https://spark.apache.org/downloads.html
unzip it in a folder of your choice (e.g.windows c:/hadoop/sparkVERSION)
If all those steps dont work, please show me the output and code you try to run.
BR
UPDATE Following post exactly desribes how to setup zeppelin on windows to run the tutorials.
https://hernandezpaul.wordpress.com/2016/11/14/apache-zeppelin-installation-on-windows-10/
I just went through it on my windows machine and it worked fine for me.