Exception: Unread Block Data with PySpark, Phoenix

2019-08-04 14:42发布

问题:

I am quite new to Python 2.6.x, PySpark, spark 1.6, HBase 1.1 and i am trying to read data from a table using Apache spark plugin.

Read Data:

dfRows = sparkConfig.getSqlContext().read\
                          .format('org.apache.phoenix.spark')\
                          .option('table', 'TableA')\
                          .option('zkUrl', 'xxx:2181:/hbase-secure')\
                          .load()

Also, I run the python file using spark-submit using the below args and jars

spark-submit --master yarn-client --executor-memory 24G --driver-memory 20G --num-executors 10 --queue aQueue --jars /usr/hdp/2.6.1.40-4/phoenix/lib/phoenix-core-4.7.0.2.6.1.40-4.jar,/usr/hdp/current/phoenix-client/lib/hbase-client.jar,/usr/hdp/current/phoenix-client/phoenix-client.jar,/usr/hdp/2.6.1.40-4/hive2/lib/twill-zookeeper-0.6.0-incubating.jar,/usr/hdp/2.6.1.40-4/hive2/lib/twill-discovery-api-0.6.0-incubating.jar,/usr/hdp/2.6.1.40-4/hive2/lib/hive-hbase-handler.jar Test.py

This works fine, however when i do a dfRows.first(), it throws the following exceptions:

Caused by: java.lang.IllegalStateException: unread block data
        at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2449)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1385)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:207)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        ... 1 more