I am quite new to Python 2.6.x, PySpark, spark 1.6, HBase 1.1 and i am trying to read data from a table using Apache spark plugin.
Read Data:
dfRows = sparkConfig.getSqlContext().read\
.format('org.apache.phoenix.spark')\
.option('table', 'TableA')\
.option('zkUrl', 'xxx:2181:/hbase-secure')\
.load()
Also, I run the python file using spark-submit
using the below args and jars
spark-submit --master yarn-client --executor-memory 24G --driver-memory 20G --num-executors 10 --queue aQueue --jars /usr/hdp/2.6.1.40-4/phoenix/lib/phoenix-core-4.7.0.2.6.1.40-4.jar,/usr/hdp/current/phoenix-client/lib/hbase-client.jar,/usr/hdp/current/phoenix-client/phoenix-client.jar,/usr/hdp/2.6.1.40-4/hive2/lib/twill-zookeeper-0.6.0-incubating.jar,/usr/hdp/2.6.1.40-4/hive2/lib/twill-discovery-api-0.6.0-incubating.jar,/usr/hdp/2.6.1.40-4/hive2/lib/hive-hbase-handler.jar Test.py
This works fine, however when i do a dfRows.first()
, it throws the following exceptions:
Caused by: java.lang.IllegalStateException: unread block data
at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2449)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1385)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:207)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
... 1 more