Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> HBase random read performance


+
Ankit Jain 2013-04-13, 05:31
+
Ted Yu 2013-04-13, 15:16
+
Adrien Mogenet 2013-04-13, 16:00
+
Harsh J 2013-04-13, 17:02
+
Jean-Marc Spaggiari 2013-04-14, 21:58
+
Anoop Sam John 2013-04-15, 10:17
+
Rishabh Agrawal 2013-04-15, 10:42
+
Ankit Jain 2013-04-15, 10:53
+
谢良 2013-04-15, 11:41
+
Ankit Jain 2013-04-15, 13:04
Copy link to this message
-
Re: 答复: HBase random read performance

Hi there, regarding this...

> We are passing random 10000 row-keys as input, while HBase is taking
around
> 17 secs to return 10000 records.
….  Given that you are generating 10,000 random keys, your multi-get is
very likely hitting all 5 nodes of your cluster.
Historically, multi-Get used to first sort the requests by RS and then
*serially* go the RS to process the multi-Get.  I'm not sure of the
current (0.94.x) behavior if it multi-threads or not.

One thing you might want to consider is confirming that client behavior,
and if it's not multi-threading then perform a test that does the same RS
sorting via...

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#
getRegionLocation%28byte[]%29

…. and then spin up your own threads (one per target RS) and see what
happens.

On 4/15/13 9:04 AM, "Ankit Jain" <[EMAIL PROTECTED]> wrote:

>Hi Liang,
>
>Thanks Liang for reply..
>
>Ans1:
>I tried by using HFile block size of 32 KB and bloom filter is enabled.
>The
>random read performance is 10000 records in 23 secs.
>
>Ans2:
>We are retrieving all the 10000 rows in one call.
>
>Ans3:
>Disk detai:
>Model Number:       ST2000DM001-1CH164
>Serial Number:      Z1E276YF
>
>Please suggest some more optimization
>
>Thanks,
>Ankit Jain
>
>On Mon, Apr 15, 2013 at 5:11 PM, 谢良 <[EMAIL PROTECTED]> wrote:
>
>> First, it's probably helpless to set block size to 4KB, please refer to
>> the beginning of HFile.java:
>>
>>  Smaller blocks are good
>>  * for random access, but require more memory to hold the block index,
>>and
>> may
>>  * be slower to create (because we must flush the compressor stream at
>>the
>>  * conclusion of each data block, which leads to an FS I/O flush).
>> Further, due
>>  * to the internal caching in Compression codec, the smallest possible
>> block
>>  * size would be around 20KB-30KB.
>>
>> Second, is it a single-thread test client or multi-threads? we couldn't
>> expect too much if the requests are one by one.
>>
>> Third, could you provide more info about  your DN disk numbers and IO
>> utils ?
>>
>> Thanks,
>> Liang
>> ________________________________________
>> 发件人: Ankit Jain [[EMAIL PROTECTED]]
>> 发送时间: 2013年4月15日 18:53
>> 收件人: [EMAIL PROTECTED]
>> 主题: Re: HBase random read performance
>>
>> Hi Anoop,
>>
>> Thanks for reply..
>>
>> I tried by setting Hfile block size 4KB and also enabled the bloom
>> filter(ROW). The maximum read performance that I was able to achieve is
>> 10000 records in 14 secs (size of record is 1.6KB).
>>
>> Please suggest some tuning..
>>
>> Thanks,
>> Ankit Jain
>>
>>
>>
>> On Mon, Apr 15, 2013 at 4:12 PM, Rishabh Agrawal <
>> [EMAIL PROTECTED]> wrote:
>>
>> > Interesting. Can you explain why this happens?
>> >
>> > -----Original Message-----
>> > From: Anoop Sam John [mailto:[EMAIL PROTECTED]]
>> > Sent: Monday, April 15, 2013 3:47 PM
>> > To: [EMAIL PROTECTED]
>> > Subject: RE: HBase random read performance
>> >
>> > Ankit
>> >                  I guess you might be having default HFile block size
>> > which is 64KB.
>> > For random gets a lower value will be better. Try will some thing like
>> 8KB
>> > and check the latency?
>> >
>> > Ya ofcourse blooms can help (if major compaction was not done at the
>>time
>> > of testing)
>> >
>> > -Anoop-
>> > ________________________________________
>> > From: Ankit Jain [[EMAIL PROTECTED]]
>> > Sent: Saturday, April 13, 2013 11:01 AM
>> > To: [EMAIL PROTECTED]
>> > Subject: HBase random read performance
>> >
>> > Hi All,
>> >
>> > We are using HBase 0.94.5 and Hadoop 1.0.4.
>> >
>> > We have HBase cluster of 5 nodes(5 regionservers and 1 master node).
>>Each
>> > regionserver has 8 GB RAM.
>> >
>> > We have loaded 25 millions records in HBase table, regions are
>>pre-split
>> > into 16 regions and all the regions are equally loaded.
>> >
>> > We are getting very low random read performance while performing multi
>> get
>> > from HBase.
>> >
>> > We are passing random 10000 row-keys as input, while HBase is taking
+
Ted Yu 2013-04-15, 13:30
+
Ted Yu 2013-04-15, 14:13
+
Ted Yu 2013-04-15, 17:03
+
lars hofhansl 2013-04-16, 14:55
+
Liu, Raymond 2013-04-16, 07:49
+
Nicolas Liochon 2013-04-16, 08:22
+
Jean-Marc Spaggiari 2013-04-16, 11:01
+
Michel Segel 2013-04-17, 12:33
+
Håvard Wahl Kongsgård 2013-04-14, 22:19
+
Mohammad Tariq 2013-04-14, 22:39
+
Ted Yu 2013-07-08, 12:49
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB