I have been able to load this MongoDB database before, but am now receiving an error I haven't been able to figure out.
Here is how I start my Spark session:
spark = SparkSession.builder \
.master("local[*]") \
.appName("collab_rec") \
.config("spark.mongodb.input.uri", "mongodb://127.0.0.1/example.collection") \
.config("spark.mongodb.output.uri", "mongodb://127.0.0.1/example.collection") \
.getOrCreate()
I run this script so that I can interact with spark through ipython wich loads the mongo spark connector package:
#!/bin/bash
export PYSPARK_DRIVER_PYTHON=ipython
${SPARK_HOME}/bin/pyspark \
--master local[4] \
--executor-memory 1G \
--driver-memory 1G \
--conf spark.sql.warehouse.dir="file:///tmp/spark-warehouse" \
--packages com.databricks:spark-csv_2.11:1.5.0 \
--packages com.amazonaws:aws-java-sdk-pom:1.10.34 \
--packages org.apache.hadoop:hadoop-aws:2.7.3 \
--packages org.mongodb.spark:mongo-spark-connector_2.11:2.0.0\
Spark loads fine and it appears the package is loading correctly as well.
Here is how I attempt to load that database into a dataframe:
df = spark.read.format("com.mongodb.spark.sql.DefaultSource").load()
However, on that line, I receive the following error:
Py4JJavaError: An error occurred while calling o46.load.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.analysis.TypeCoercion$.findTightestCommonTypeOfTwo()Lscala/Function2;
at com.mongodb.spark.sql.MongoInferSchema$.com$mongodb$spark$sql$MongoInferSchema$$compatibleType(MongoInferSchema.scala:132)
at com.mongodb.spark.sql.MongoInferSchema$$anonfun$3.apply(MongoInferSchema.scala:76)
at com.mongodb.spark.sql.MongoInferSchema$$anonfun$3.apply(MongoInferSchema.scala:76)
From what I can see through the following documentation/tutorial I am attempting to load the dataframe correctly:
https://docs.mongodb.com/spark-connector/master/python-api/
I am using Spark 2.2.0 Note that I have been able to replicate this error on both my mac and linux through AWS.