'SparkSession' object has no attribute 

2019-08-19 21:55发布

问题:

I am using Apache spark in batch mode. I have set up an entire pipeline that transforms text into TFIDF vectors and then predicts a boolean class using Logistic regression:

# Chain previously created feature transformers, indexers and regression in a Pipeline
pipeline = Pipeline(stages=[tokenizer, hashingTF, idf, 
                        labelIndexer, featureIndexer, lr])
#Fit the full model to the training data
model = pipeline.fit(trainingData)

#Predict test data 
predictions = model.transform(testData)

I can examine predictions, which is a spark dataframe, and it is what I expect it to be. Next, I want to see a confusion matrix, so I convert the scores and labels to a RDD and pass that to BinaryClassificationMetrics():

predictionAndLabels = predictions.select('prediction','label').rdd

Finally, I pass that to the BinaryClassificationMetrics:

metrics = BinaryClassificationMetrics(predictionAndLabels) #this errors out

Here's the error:

AttributeError: 'SparkSession' object has no attribute 'serializer'

This error is not helpful and searching for it raises a broad spectrum of issues. the only thing I've found that seems similar is this post which has no answers: How to resolve error "AttributeError: 'SparkSession' object has no attribute 'serializer'?

Any assistance is appreciated!

回答1:

For prosperity's sake, here's what I did to fix this. When I initiate the Spark Session and the SQL context, I was doing this, which is not right:

sc = SparkSession.builder.appName('App Name').master("local[*]").getOrCreate()
sqlContext = SQLContext(sc)

This problem was resolved by doing this instead:

sc = SparkSession.builder.appName('App Name').master("local[*]").getOrCreate()
sqlContext = SQLContext(sparkContext=sc.sparkContext, sparkSession=sc)

I'm not sure why that needed to be explicit, and would welcome clarification from the community if someone knows.