SQLContext implicits

2020-03-22 14:31发布

问题:

I am learning spark and scala. I am well versed in java, but not so much in scala. I am going through a tutorial on spark, and came across the following line of code, which has not been explained:

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._

(sc is the SparkContext instance)

I know the concepts behind scala implicits (atleast I think I know). Could somebody explain to me what exactly is meant by the import statement above? What implicits are bound to the sqlContext instance when it is instantiated and how? Are these implicits defined inside the SQLContext class?

EDIT The following seems to work for me as well (fresh code):

val sqlc = new SQLContext(sc)
import sqlContext.implicits._

In this code just above. what exactly is sqlContext and where is it defined?

回答1:

From ScalaDoc: sqlContext.implicits contains "(Scala-specific) Implicit methods available in Scala for converting common Scala objects into DataFrames. "

And is also explained in Spark programming guide:

// this is used to implicitly convert an RDD to a DataFrame.
import sqlContext.implicits._

For example in the code below .toDF() won't work unless you will import sqlContext.implicits:

val airports = sc.makeRDD(Source.fromFile(airportsPath).getLines().drop(1).toSeq, 1)
    .map(s => s.replaceAll("\"", "").split(","))
    .map(a => Airport(a(0), a(1), a(2), a(3), a(4), a(5), a(6)))
    .toDF()

What implicits are bound to the sqlContext instance when it is instantiated and how? Are these implicits defined inside the SQLContext class?

Yes they are defined in object implicits inside SqlContext class, which extends SQLImplicits.scala. It looks there are two types of implicit conversions defined there:

  1. RDD to DataFrameHolder conversion, which enables using above mentioned rdd.toDf().
  2. Various instances of Encoder which are "Used to convert a JVM object of type T to and from the internal Spark SQL representation."