spark - No schema defined, and no Parquet data fil

2019-08-12 09:00发布

问题:

first I started

$SPARK_HOME/bin/pyspark

and write this code

sqlContext.load("jdbc", url="jdbc:mysql://IP:3306/test", driver="com.mysql.jdbc.Driver", dbtable="test.test_tb")

when I write only dbtable= "test_db", the error is same.

After this error is occurred,

py4j.protocol.Py4JJavaError: An error occurred while calling o66.load. : java.lang.AssertionError: assertion failed: No schema defined, and no Parquet data file or summary file found under . at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.org$apache$spark$sql$parquet$ParquetRelation2$MetadataCache$$readSchema(newParquet.scala:429) .....

why this error was occured?? I want to know and solve this problem.

thank you.

回答1:

I don't know the reason of this error, but I stumbled upon it, and then found a way to make the same thing work.

Try this:

df = sqlContext.read.format("jdbc").options(url="jdbc:mysql://server/table?user=usr&password=secret", dbtable="table_name").load()

I suppose the .load syntax is no longer working, or does not work for jdbc. Hope it works!

By the way, I started the console with this command:

SPARK_CLASSPATH=~/progs/postgresql-9.4-1205.jdbc42.jar pyspark

My db is in postgres, and so I downloaded the jar with the jdbc, and added it to my classpath as suggested in the documentation. http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases