I am storing data in hbase having 5 region servers. I am using md5 hash of url as my row keys. Currently all the data is getting stored in one region server only. So I want to pre-split the regions so that data will go uniformly across all region server, so that data will go in each region server uniformly. I want to split data as first character of row key.As first character is from 0 to f(16 characters). Like data with rowkey starting from 0 to 3 will go in 1st region server, 3-6 on 2nd , 6-9 on 3rd, a-d on 4th, d-f on 5th. How can I do it ?
相关问题
- Spark on Yarn Container Failure
- enableHiveSupport throws error in java spark code
- spark select and add columns with alias
- Unable to generate jar file for Hadoop
-
hive: cast array
> into map
相关文章
- Java写文件至HDFS失败
- mapreduce count example
- hbase-client 2.0.x error
- Could you give me any clue Why 'Cannot call me
- Hive error: parseexception missing EOF
- Exception in thread “main” java.lang.NoClassDefFou
- ClassNotFoundException: org.apache.spark.SparkConf
- How can I configure the maven shade plugin to incl
In case you are using Apache Phoenix for creating tables in HBase, you can specify SALT_BUCKETS in the CREATE statement. The table will split into as many regions as the bucket mentioned. Phoenix calculates the Hash of rowkey (most probably a numeric hash % SALT_BUCKETS) and assigns the column cell to the appropriate region.
This will pre-split the table into 3 regions
Alternatively, the HBase default UI, allows you to split regions accordingly.
You can provide a SPLITS property when creating the table.
The 4 split points will generate 5 regions.
Please be noticed that HBase's DefaultLoadBalancer doesn't guarantee a 100% even distribution between regionservers, it could happen that a regionserver hosts multiple regions from the same table.
For more information about how it works take a look at this:
If you have all the data have already been stored, I recommend you just move some regions to another region servers manually using hbase shell.