Hbase read performance varying abnormally

2019-06-11 03:33发布

问题:

I've installed HBase 0.94.0. I had to improve my read performance through scan. I've inserted random 100000 records.

When I set setCache(100); my performance was 16 secs for 100000 records.

When I set it to setCache(50) my performance was 90 secs for 100000 records.

When I set it to setCache(10); my performance was 16 secs for 100000 records

public class Test {
    public static void main(String[] args) {

    long start, middle, end;

    HTableDescriptor descriptor = new HTableDescriptor("Student7");
    descriptor.addFamily(new HColumnDescriptor("No"));
    descriptor.addFamily(new HColumnDescriptor("Subject"));

    try {   
    HBaseConfiguration config = new HBaseConfiguration();
    HBaseAdmin admin = new HBaseAdmin(config);

    admin.createTable(descriptor);
            HTable table = new HTable(config, "Student7");
            System.out.println("Table created !");

    start = System.currentTimeMillis();

    for(int i =1;i<100000;i++) {
        String s=Integer.toString(i);
        Put p = new Put(Bytes.toBytes(s));
        p.add(Bytes.toBytes("No"), Bytes.toBytes("IDCARD"),Bytes.toBytes("i+10"));
        p.add(Bytes.toBytes("No"), Bytes.toBytes("PHONE"),Bytes.toBytes("i+20"));
        p.add(Bytes.toBytes("No"), Bytes.toBytes("PAN"),Bytes.toBytes("i+30"));
        p.add(Bytes.toBytes("No"), Bytes.toBytes("ACCT"),Bytes.toBytes("i+40"));
        p.add(Bytes.toBytes("Subject"), Bytes.toBytes("English"),Bytes.toBytes("50"));
        p.add(Bytes.toBytes("Subject"), Bytes.toBytes("Science"),Bytes.toBytes("60"));
        p.add(Bytes.toBytes("Subject"), Bytes.toBytes("History"),Bytes.toBytes("70"));

        table.put(p);
    }
    middle = System.currentTimeMillis();

    Scan s = new Scan();
    s.setCaching(100);      
    ResultScanner scanner = table.getScanner(s);

    try {
        for (Result rr = scanner.next(); rr != null; rr=scanner.next()) {
            System.out.println("Found row: " + rr);
        }
        end = System.currentTimeMillis(); 
    } finally {
        scanner.close();
    }       
        System.out.println("TableCreation-Time: " + (middle - start));
        System.out.println("Scan-Time: " + (middle - end));
    } catch (IOException e) {
        System.out.println("IOError: cannot create Table.");
        e.printStackTrace();
        }
    }
}

Why is this happening?

回答1:

Why would you want to return every record in your 100000 records table? You're doing a full table scan and just as in any large database this is slow.

Try thinking about a more useful use case in which you would like to return some columns of a record or a range of records.

HBase does only have one index on it's table, the row key. Make use of that. Try defining your row key so that you can get the data you need just by specifying the row key.

Let's say you would like to know the value of Subject:History for the rows with a row key between 80000 and 80100. (Note that setCaching(100) means HBase will fetch 100 records per RPC and is this case thus one. Fetching 100 rows obviously requires more memory opposed to fetching, let's say, one row. Keep that in mind in a large multi-user environment.)

Long start, end;
start = System.currentTimeMillis();

Scan s = new Scan(String.valueOf(80000).getBytes(), String.valueOf(80100).getBytes());
s.setCaching(100);
s.addColumn("Subject".getBytes(), "History".getBytes());

ResultScanner scanner = table.getScanner(s);
try {
    for (Result rr = scanner.next(); rr != null; rr=scanner.next()) {
        System.out.println("Found row: " + new String(rr.getRow(), "UTF-8") + " value: " + new String(rr.getValue("Subject".getBytes(), "History".getBytes()), "UTF-8")));
    }
    end = System.currentTimeMillis(); 
} finally {
    scanner.close();
}       
System.out.println("Scan: " + (end - start));

This might look stupid because how would you know which rows you need just by an integer? Well, exactly, but that's why you need to design a row key according to what you're about to query instead of just using an incremental value as you would in a traditional database.

Try this example. It should be fast.

Note: I didn't run the example. I just typed it here. Maybe there are some small syntax errors you should correct but I hope the idea is clear.