Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - HBase random read performance


+
Ankit Jain 2013-04-13, 05:31
+
Ted Yu 2013-04-13, 15:16
+
Adrien Mogenet 2013-04-13, 16:00
+
Harsh J 2013-04-13, 17:02
+
Jean-Marc Spaggiari 2013-04-14, 21:58
+
Anoop Sam John 2013-04-15, 10:17
+
Rishabh Agrawal 2013-04-15, 10:42
+
Ankit Jain 2013-04-15, 10:53
+
谢良 2013-04-15, 11:41
+
Ankit Jain 2013-04-15, 13:04
Copy link to this message
-
Re: 答复: HBase random read performance
Doug Meil 2013-04-15, 13:21

Hi there, regarding this...

> We are passing random 10000 row-keys as input, while HBase is taking
around
> 17 secs to return 10000 records.
….  Given that you are generating 10,000 random keys, your multi-get is
very likely hitting all 5 nodes of your cluster.
Historically, multi-Get used to first sort the requests by RS and then
*serially* go the RS to process the multi-Get.  I'm not sure of the
current (0.94.x) behavior if it multi-threads or not.

One thing you might want to consider is confirming that client behavior,
and if it's not multi-threading then perform a test that does the same RS
sorting via...

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#
getRegionLocation%28byte[]%29

…. and then spin up your own threads (one per target RS) and see what
happens.

On 4/15/13 9:04 AM, "Ankit Jain" <[EMAIL PROTECTED]> wrote:

>Hi Liang,
>
>Thanks Liang for reply..
>
>Ans1:
>I tried by using HFile block size of 32 KB and bloom filter is enabled.
>The
>random read performance is 10000 records in 23 secs.
>
>Ans2:
>We are retrieving all the 10000 rows in one call.
>
>Ans3:
>Disk detai:
>Model Number:       ST2000DM001-1CH164
>Serial Number:      Z1E276YF
>
>Please suggest some more optimization
>
>Thanks,
>Ankit Jain
>
>On Mon, Apr 15, 2013 at 5:11 PM, 谢良 <[EMAIL PROTECTED]> wrote:
>
>> First, it's probably helpless to set block size to 4KB, please refer to
>> the beginning of HFile.java:
>>
>>  Smaller blocks are good
>>  * for random access, but require more memory to hold the block index,
>>and
>> may
>>  * be slower to create (because we must flush the compressor stream at
>>the
>>  * conclusion of each data block, which leads to an FS I/O flush).
>> Further, due
>>  * to the internal caching in Compression codec, the smallest possible
>> block
>>  * size would be around 20KB-30KB.
>>
>> Second, is it a single-thread test client or multi-threads? we couldn't
>> expect too much if the requests are one by one.
>>
>> Third, could you provide more info about  your DN disk numbers and IO
>> utils ?
>>
>> Thanks,
>> Liang
>> ________________________________________
>> 发件人: Ankit Jain [[EMAIL PROTECTED]]
>> 发送时间: 2013年4月15日 18:53
>> 收件人: [EMAIL PROTECTED]
>> 主题: Re: HBase random read performance
>>
>> Hi Anoop,
>>
>> Thanks for reply..
>>
>> I tried by setting Hfile block size 4KB and also enabled the bloom
>> filter(ROW). The maximum read performance that I was able to achieve is
>> 10000 records in 14 secs (size of record is 1.6KB).
>>
>> Please suggest some tuning..
>>
>> Thanks,
>> Ankit Jain
>>
>>
>>
>> On Mon, Apr 15, 2013 at 4:12 PM, Rishabh Agrawal <
>> [EMAIL PROTECTED]> wrote:
>>
>> > Interesting. Can you explain why this happens?
>> >
>> > -----Original Message-----
>> > From: Anoop Sam John [mailto:[EMAIL PROTECTED]]
>> > Sent: Monday, April 15, 2013 3:47 PM
>> > To: [EMAIL PROTECTED]
>> > Subject: RE: HBase random read performance
>> >
>> > Ankit
>> >                  I guess you might be having default HFile block size
>> > which is 64KB.
>> > For random gets a lower value will be better. Try will some thing like
>> 8KB
>> > and check the latency?
>> >
>> > Ya ofcourse blooms can help (if major compaction was not done at the
>>time
>> > of testing)
>> >
>> > -Anoop-
>> > ________________________________________
>> > From: Ankit Jain [[EMAIL PROTECTED]]
>> > Sent: Saturday, April 13, 2013 11:01 AM
>> > To: [EMAIL PROTECTED]
>> > Subject: HBase random read performance
>> >
>> > Hi All,
>> >
>> > We are using HBase 0.94.5 and Hadoop 1.0.4.
>> >
>> > We have HBase cluster of 5 nodes(5 regionservers and 1 master node).
>>Each
>> > regionserver has 8 GB RAM.
>> >
>> > We have loaded 25 millions records in HBase table, regions are
>>pre-split
>> > into 16 regions and all the regions are equally loaded.
>> >
>> > We are getting very low random read performance while performing multi
>> get
>> > from HBase.
>> >
>> > We are passing random 10000 row-keys as input, while HBase is taking
+
Ted Yu 2013-04-15, 13:30
+
Ted Yu 2013-04-15, 14:13
+
Ted Yu 2013-04-15, 17:03
+
lars hofhansl 2013-04-16, 14:55
+
Liu, Raymond 2013-04-16, 07:49
+
Nicolas Liochon 2013-04-16, 08:22
+
Jean-Marc Spaggiari 2013-04-16, 11:01
+
Michel Segel 2013-04-17, 12:33
+
Håvard Wahl Kongsgård 2013-04-14, 22:19
+
Mohammad Tariq 2013-04-14, 22:39
+
Ted Yu 2013-07-08, 12:49