Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase random read performance


Copy link to this message
-
答复: HBase random read performance
First, it's probably helpless to set block size to 4KB, please refer to the beginning of HFile.java:

 Smaller blocks are good
 * for random access, but require more memory to hold the block index, and may
 * be slower to create (because we must flush the compressor stream at the
 * conclusion of each data block, which leads to an FS I/O flush). Further, due
 * to the internal caching in Compression codec, the smallest possible block
 * size would be around 20KB-30KB.

Second, is it a single-thread test client or multi-threads? we couldn't expect too much if the requests are one by one.

Third, could you provide more info about  your DN disk numbers and IO utils ?

Thanks,
Liang
________________________________________
发件人: Ankit Jain [[EMAIL PROTECTED]]
发送时间: 2013年4月15日 18:53
收件人: [EMAIL PROTECTED]
主题: Re: HBase random read performance

Hi Anoop,

Thanks for reply..

I tried by setting Hfile block size 4KB and also enabled the bloom
filter(ROW). The maximum read performance that I was able to achieve is
10000 records in 14 secs (size of record is 1.6KB).

Please suggest some tuning..

Thanks,
Ankit Jain

On Mon, Apr 15, 2013 at 4:12 PM, Rishabh Agrawal <
[EMAIL PROTECTED]> wrote:

> Interesting. Can you explain why this happens?
>
> -----Original Message-----
> From: Anoop Sam John [mailto:[EMAIL PROTECTED]]
> Sent: Monday, April 15, 2013 3:47 PM
> To: [EMAIL PROTECTED]
> Subject: RE: HBase random read performance
>
> Ankit
>                  I guess you might be having default HFile block size
> which is 64KB.
> For random gets a lower value will be better. Try will some thing like 8KB
> and check the latency?
>
> Ya ofcourse blooms can help (if major compaction was not done at the time
> of testing)
>
> -Anoop-
> ________________________________________
> From: Ankit Jain [[EMAIL PROTECTED]]
> Sent: Saturday, April 13, 2013 11:01 AM
> To: [EMAIL PROTECTED]
> Subject: HBase random read performance
>
> Hi All,
>
> We are using HBase 0.94.5 and Hadoop 1.0.4.
>
> We have HBase cluster of 5 nodes(5 regionservers and 1 master node). Each
> regionserver has 8 GB RAM.
>
> We have loaded 25 millions records in HBase table, regions are pre-split
> into 16 regions and all the regions are equally loaded.
>
> We are getting very low random read performance while performing multi get
> from HBase.
>
> We are passing random 10000 row-keys as input, while HBase is taking around
> 17 secs to return 10000 records.
>
> Please suggest some tuning to increase HBase read performance.
>
> Thanks,
> Ankit Jain
> iLabs
>
>
>
> --
> Thanks,
> Ankit Jain
>
> ________________________________
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>

--
Thanks,
Ankit Jain
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB