I got this far:
import com.fasterxml.jackson.module.scala.DefaultScalaModule
import com.fasterxml.jackson.module.scala.experimental.ScalaObjectMapper
import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.databind.DeserializationFeature
case class Person(name: String, lovesPandas: Boolean)
val mapper = new ObjectMapper()
val input = sc.textFile("files/pandainfo.json")
val result = input.flatMap(record => {
try{
Some(mapper.readValue(record, classOf[Person]))
} catch {
case e: Exception => None
}
})
result.collect
but get Array()
as a result (with no error). The file is https://github.com/databricks/learning-spark/blob/master/files/pandainfo.json How do I go on from here?
After consulting Spark: broadcasting jackson ObjectMapper I tried
import org.apache.spark._
import com.fasterxml.jackson.module.scala.DefaultScalaModule
import com.fasterxml.jackson.module.scala.experimental.ScalaObjectMapper
import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.databind.DeserializationFeature
case class Person(name: String, lovesPandas: Boolean)
val input = """{"name":"Sparky The Bear", "lovesPandas":true}"""
val result = input.flatMap(record => {
try{
val mapper = new ObjectMapper()
mapper.registerModule(DefaultScalaModule)
mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
Some(mapper.readValue(record, classOf[Person]))
} catch {
case e: Exception => None
}
})
result.collect
and got
Name: Compile Error
Message: <console>:34: error: overloaded method value readValue with alternatives:
[T](x$1: Array[Byte], x$2: com.fasterxml.jackson.databind.JavaType)T <and>
[T](x$1: Array[Byte], x$2: com.fasterxml.jackson.core.type.TypeReference[_])T <and>
[T](x$1: Array[Byte], x$2: Class[T])T <and>