I have a Hbase (v 0.94.19 with Hadoop 1.2.1) setup with one master machine and two region servers. Each region server has 16 GB heap (6.4 GB cache, 4.0 GB memstore) and 1.6 TB (2 X 800 GB) SSD disk space. There is only one table with single column-family which is pre-split into 128 regions (00 - ff). Key is a 32 byte hex string and the values are 800 - 900 bytes long on average. Update rate is around 3k - 5k items per second of which around 20% are new entries. Hadoop replication factor is set as 2. Rest of the Hadoop and Hbase configs are default.
I ran a read benchmark (it's not really a benchmark, but my own code) on this setup which reads random (but valid) entries through Java Hbase interface. I get an average 30 - 40 ms per read which is unusual in my opinion. Also, this read time increases as the number of store files increases in each regions and again comes down after I do a major compaction. The Hbase block locality index is always reported as 0 by both region servers even right after a major compaction.
My questions are - Does anyone see any obvious mistakes I am doing here? Does increasing the number of disks in each region server (e.g. if I switch to 4 X 400 GB) help to reduce the read latency? Are there any SSD optimizations (e.g. over-provisioning) which may help? Lastly, what may cause the block locality index to be 0 always?
Please ask me if you need more info. Thank you.