Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - HBase random read performance


Copy link to this message
-
答复: HBase random read performance
谢良 2013-04-15, 11:41
First, it's probably helpless to set block size to 4KB, please refer to the beginning of HFile.java:

 Smaller blocks are good
 * for random access, but require more memory to hold the block index, and may
 * be slower to create (because we must flush the compressor stream at the
 * conclusion of each data block, which leads to an FS I/O flush). Further, due
 * to the internal caching in Compression codec, the smallest possible block
 * size would be around 20KB-30KB.

Second, is it a single-thread test client or multi-threads? we couldn't expect too much if the requests are one by one.

Third, could you provide more info about  your DN disk numbers and IO utils ?

Thanks,
Liang
________________________________________
发件人: Ankit Jain [[EMAIL PROTECTED]]
发送时间: 2013年4月15日 18:53
收件人: [EMAIL PROTECTED]
主题: Re: HBase random read performance

Hi Anoop,

Thanks for reply..

I tried by setting Hfile block size 4KB and also enabled the bloom
filter(ROW). The maximum read performance that I was able to achieve is
10000 records in 14 secs (size of record is 1.6KB).

Please suggest some tuning..

Thanks,
Ankit Jain

On Mon, Apr 15, 2013 at 4:12 PM, Rishabh Agrawal <
[EMAIL PROTECTED]> wrote:

> Interesting. Can you explain why this happens?
>
> -----Original Message-----
> From: Anoop Sam John [mailto:[EMAIL PROTECTED]]
> Sent: Monday, April 15, 2013 3:47 PM
> To: [EMAIL PROTECTED]
> Subject: RE: HBase random read performance
>
> Ankit
>                  I guess you might be having default HFile block size
> which is 64KB.
> For random gets a lower value will be better. Try will some thing like 8KB
> and check the latency?
>
> Ya ofcourse blooms can help (if major compaction was not done at the time
> of testing)
>
> -Anoop-
> ________________________________________
> From: Ankit Jain [[EMAIL PROTECTED]]
> Sent: Saturday, April 13, 2013 11:01 AM
> To: [EMAIL PROTECTED]
> Subject: HBase random read performance
>
> Hi All,
>
> We are using HBase 0.94.5 and Hadoop 1.0.4.
>
> We have HBase cluster of 5 nodes(5 regionservers and 1 master node). Each
> regionserver has 8 GB RAM.
>
> We have loaded 25 millions records in HBase table, regions are pre-split
> into 16 regions and all the regions are equally loaded.
>
> We are getting very low random read performance while performing multi get
> from HBase.
>
> We are passing random 10000 row-keys as input, while HBase is taking around
> 17 secs to return 10000 records.
>
> Please suggest some tuning to increase HBase read performance.
>
> Thanks,
> Ankit Jain
> iLabs
>
>
>
> --
> Thanks,
> Ankit Jain
>
> ________________________________
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>

--
Thanks,
Ankit Jain