Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Improving HBase read performance (based on YCSB)


Copy link to this message
-
Re: Improving HBase read performance (based on YCSB)
Hi Bharath,

What does "iostat -dmx 5" say while you're running the benchmark? Let
it print out 10 or 15 lines and copy-paste here.

How do you know the disks have unused bandwidth? Sounds like they're
just bottlenecked on seeks.
Some upcoming work in 0.94 should give you a good boost here (Dhruba's
work to do checksumming at the HBase level)

-Todd

On Mon, Feb 13, 2012 at 8:43 PM, Bharath Ravi <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I have a distributed HBase setup, on which I'm running the
> YCSB<https://github.com/brianfrankcooper/YCSB/wiki/running-a-workload>benchmark.
> There are 5 region servers, each a Dual core with around 4GB of memory,
> connected simply by a 1Gbps ethernet switch.
>
> The number of "handlers" per regionserver is set to 500 (!) and HDFS's
> maximum receivers per datanode is 4096.
>
> The benchmark dataset is large enough not to fit in memory.
> Update/Insert/Write throughput goes up to 8000 ops/sec easily.
> However, I see read latencies in the order of seconds, and read throughputs
> of only a few 100 ops per second.
>
> "Top" tells me that the CPU's on regionservers spend 70-80% of their time
> waiting for IO, while disk and network
> have plenty of unused bandwidth. How could I diagnose where the read
> bottleneck is?
>
> Any help would be greatly appreciated :)
>
> Thanks in advance!
> --
> Bharath Ravi

--
Todd Lipcon
Software Engineer, Cloudera