Submitting spark Jobs over livy using curl

2020-05-03 12:32发布

问题:

I'm submitting spark jobs on a livy (0.6.0) session through Curl

The jobs are a big jar file that extends the Job interface just exactly like this : https://stackoverflow.com/a/49220879/8557851

Actually when running this code using this curl command :

curl -X POST -d '{"kind": "spark","files":["/config.json"],"jars":["/myjar.jar"],"driverMemory":"512M","executorMemory":"512M"}' -H "Content-Type: application/json" localhost:8998/sessions/

When it comes to the code it is exactly like the answer shown above :

package com.mycompany.test
import org.apache.livy.{Job, JobContext}
import org.apache.spark._
import org.apache.livy.scalaapi._

object Test extends Job[Boolean]{
  override def call(jc: JobContext): Boolean = {
  val sc = jc.sc
  sc.getConf.getAll.foreach(println)
  return true
}

As for the error it is a java Nullpointer exception as shown below

Exception in thread "main" java.lang.NullPointerException
    at org.apache.livy.rsc.driver.JobWrapper.cancel(JobWrapper.java:90)
    at org.apache.livy.rsc.driver.RSCDriver.shutdown(RSCDriver.java:127)
    at org.apache.livy.rsc.driver.RSCDriver.run(RSCDriver.java:356)
    at org.apache.livy.rsc.driver.RSCDriverBootstrapper.main(RSCDriverBootstrapper.java:93)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

as the output excepted is to start running the job in the jar

回答1:

I've used livy REST apis and with respect to that there are 2 approaches to submit spark job. Please refer rest api docs, you will get fair understanding livy rest requests.:

1. Batch (/batches) :
You submit request, you get job id. Based on job id you poll for status of spark job. Here you have option to execute uber jar as well as code file but I've never used latter

2. Session (/sessions and /sessions/{sessionId}/statements):
You submit spark job as code, no need to create uber jar. Here, you first create a Session and in this session you execute Statement/s (actual code)

For both the approaches, if you check documentation it has nice explanation about corresponding rest requests and request body/parameters.

Examples/Samples are here, here

Correction to your code, would be:

Batch

curl \
  -X POST \
  -d '{
    "kind": "spark",
    "files": [
      "<use-absolute-path>"
    ],
    "file": "absolute-path-to-your-application-jar",
    "className": "fully-qualified-spark-class-name",
    "driverMemory": "512M",
    "executorMemory": "512M",
    "conf": {<any-other-configs-as-key-val>}
  }' \
  -H "Content-Type: application/json" \
  localhost:8998/batches/

Session and Statement

// Create a session
curl \
  -X POST \
  -d '{
    "kind": "spark",
    "files": [
      "<use-absolute-path>"
    ],
    "driverMemory": "512M",
    "executorMemory": "512M",
    "conf": {<any-other-configs-as-key-val>}
  }' \
  -H "Content-Type: application/json" \
  localhost:8998/sessions/

// Run code/statement in session created above
curl \
  -X POST \
  -d '{
    "kind": "spark",
    "code": "spark-code"
  }' \
  -H "Content-Type: application/json" \
  localhost:8998/sessions/{sessionId}/statements