I am using spark 1.4 and I am trying to read the data from Hbase by using sc.newAPIHadoopRDD to read 2.7 GB data but there are 5 task are created for this stage and taking 2 t0 3 minutes to process it. Can anyone let me know how to increase the more partitions to read the data fast ?
相关问题
- How to maintain order of key-value in DataFrame sa
- Spark on Yarn Container Failure
- In Spark Streaming how to process old data and del
- Filter from Cassandra table by RDD values
- Spark 2.1 cannot write Vector field on CSV
相关文章
- Livy Server: return a dataframe as JSON?
- SQL query Frequency Distribution matrix for produc
- How to filter rows for a specific aggregate with s
- How to name file when saveAsTextFile in spark?
- hbase-client 2.0.x error
- Spark save(write) parquet only one file
- Could you give me any clue Why 'Cannot call me
- Why does the Spark DataFrame conversion to RDD req
org.apache.hadoop.hbase.mapreduce.TableInputFormat
creates a partition for each region. Your table seems to be split into 5 regions. Pre-splitting your table should increase the number of partitions (have a look here for more information on splitting).