import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import play.api.libs.json._
import java.util.Date
import javax.xml.bind.DatatypeConverter
object Test {
def main(args:Array[String]): Unit = {
val logFile="test.txt"
val conf=new SparkConf().setAppName("Json Test")
val sc = new SparkContext(conf)
try {
val out= "output/test"
val logData=sc.textFile(logFile,2).map(line => Json.parse(cleanTypo(line))).cache()
} finally {
sc.stop()
}
}
Since it was said about the Spark jackson conflict problem, I have rebuilt Spark using mvn versions:use-latest-versions -Dincludes=org.codehaus.jackson:jackson-core-asl mvn versions:use-latest-versions -Dincludes=org.codehaus.jackson:jackson-mapper-asl
So the jars have been updated to 1.9.x But I still have the error
15/03/02 03:12:19 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.NoClassDefFoundError: org/codehaus/jackson/annotate/JsonClass
at org.codehaus.jackson.map.introspect.JacksonAnnotationIntrospector.findDeserializationType(JacksonAnnotationIntrospector.java:524)
at org.codehaus.jackson.map.deser.BasicDeserializerFactory.modifyTypeByAnnotation(BasicDeserializerFactory.java:732)
at org.codehaus.jackson.map.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:427)
at org.codehaus.jackson.map.deser.StdDeserializerProvider._createDeserializer(StdDeserializerProvider.java:398)
at org.codehaus.jackson.map.deser.StdDeserializerProvider._createAndCache2(StdDeserializerProvider.java:307)
at org.codehaus.jackson.map.deser.StdDeserializerProvider._createAndCacheValueDeserializer(StdDeserializerProvider.java:287)
at org.codehaus.jackson.map.deser.StdDeserializerProvider.findValueDeserializer(StdDeserializerProvider.java:136)
at org.codehaus.jackson.map.deser.StdDeserializerProvider.findTypedValueDeserializer(StdDeserializerProvider.java:157)
at org.codehaus.jackson.map.ObjectMapper._findRootDeserializer(ObjectMapper.java:2468)
at org.codehaus.jackson.map.ObjectMapper._readValue(ObjectMapper.java:2383)
at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1094)
at play.api.libs.json.JacksonJson$.parseJsValue(JsValue.scala:477)
at play.api.libs.json.Json$.parse(Json.scala:16)
We hit almost the exact same issue. We were trying to use 1.9.2 but hit a no such method error as well.
Annoyingly there is not only 1 version conflict to deal with but 2. First of all Spark depends on Hadoop (for hdfs) which depends on a 1.8.x build of the jackson json and this is the conflict which you are seeing. Spark (at least 1.2+) then uses the jackson 2.4.4 core which actually got moved to com.fasterxml.jackson.core so it does not actually conflict with 1.8.x due to the different package names.
So in your case your code should work if you do 1 of 3 things:
There are going to be a lot more issues like this to come unfortunately due to the nature of spark and how it already has all of its own internal dependencies on the classpath so any job dependencies that conflict will never work out. Spark already does some dependency shading to avoid this issue with packages like guava but this is not currently done with jackson.