Using contains in scala - exception

2019-09-14 08:55发布

问题:

I am encountering this error: java.lang.ClassCastException: scala.collection.immutable.$colon$colon cannot be cast to [Ljava.lang.Object; whenever I try to use "contains" to find if a string is inside an array. Is there a more appropriate way of doing this? Or, am I doing something wrong? (I am fairly new to Scala)

Here is the code:

val matches = Set[JSONObject]()
val config = new SparkConf()
val sc = new SparkContext("local", "SparkExample", config)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)

val ebay = sqlContext.read.json("/Users/thomassquires/Downloads/products.json")
val catalogue = sqlContext.read.json("/Users/thomassquires/Documents/catalogue2.json")

val eins = ebay.map(item => (item.getAs[String]("ID"), Option(item.getAs[Set[Row]]("itemSpecifics"))))
  .filter(item => item._2.isDefined)
  .map(item => (item._1 , item._2.get.find(x => x.getAs[String]("k") == "EAN")))
  .filter(x => x._2.isDefined)
  .map(x => (x._1, x._2.get.getAs[String]("v")))
  .collect()

    def catEins =  catalogue.map(r => (r.getAs[String]("_id"), Option(r.getAs[Array[String]]("item_model_number")))).filter(r => r._2.isDefined).map(r => (r._1, r._2.get)).collect()

  def matched = for(ein <- eins) yield (ein._1, catEins.filter(z => z._2.contains(ein._2)))

The exception occurs on the last line. I have tried a few different variants.

My data structure is one List[Tuple2[String, String]] and one List[Tuple2[String, Array[String]]] . I need to find the zero or more matches from the second list that contain the string.

Thanks

回答1:

Long story short (there is still part that eludes me here*) you're using wrong types. getAs is implemented as fieldIndex (String => Int) followed by get (Int => Any) followed by asInstanceOf.

Since Spark doesn't use Arrays nor Sets but WrappedArray to store array column data, calls like getAs[Array[String]] or getAs[Set[Row]] are not valid. If you want specific types you should use either getAs[Seq[T]] or getAsSeq[T] and convert your data to desired type with toSet / toArray.


* See Why wrapping a generic method call with Option defers ClassCastException?