Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> batch update question


Copy link to this message
-
Re: batch update question

For the 2nd part of the question, if you have 10 Puts it's more efficient to send a single RS message with 10 Puts than send 10 RS messages with 1 Put apiece.  There are 2 words to be careful with, and those are "always" and "never", because there is an exception: if you are using the client writeBuffer and each of those 10 Puts are going to a different RegionServer, then you haven't really gained much.

To answer the next question of how you know where the Puts are going, see this method…

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getRegionLocation%28byte[],%20boolean%29

Because the Hbase client talks directly to each RS, it has to know the region boundaries.

From: Lin Ma <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Thursday, September 6, 2012 11:54 AM
To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>, Doug Meil <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Cc: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Subject: Re: batch update question

Thank you Doug,

Very effective reply. :-)

- why batch update could resolve contention issue on the same row? Could you elaborate a bit more or show me an example?
- Batch update always have good performance compared to single update (when we measure total throughput)?

regards,
Lin

On Thu, Sep 6, 2012 at 12:59 AM, Doug Meil <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:

Hi there, if you look in the source code for HTable there is a list of Put
objects.  That's the buffer, and it's a client-side buffer.

On 9/5/12 12:04 PM, "Lin Ma" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:

>Thank you Stack for the details directions!
>
>1. You are right, I have not met with any real row contention issues. My
>purpose is understanding the issue in advance, and also from this issue to
>understand HBase generals better;
>2. For the comments from API Url page you referred -- "If
>isAutoFlush<http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client
>/HTableInterface.html#isAutoFlush%28%29>is
>false, the update is buffered until the internal buffer is full.", I
>am
>confused what is the buffer? Buffer at client side or buffer in region
>server? Is there a way to configure its size to hold until flushing?
>3. Why batch could resolve contention on the same raw issue in theory,
>compared to non-batch operation? Besides preparation the solution in my
>mind in advance, I want to learn a bit about why. :-)
>
>regards,
>Lin
>
>On Wed, Sep 5, 2012 at 4:00 AM, Stack <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
>
>> On Sun, Sep 2, 2012 at 2:13 AM, Lin Ma <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
>> > Hello guys,
>> >
>> > I am reading the book "HBase, the definitive guide", at the beginning
>>of
>> > chapter 3, it is mentioned in order to reduce performance impact for
>> > clients to update the same row (lock contention issues for automatic
>> > write), batch update is preferred. My questions is, for MR job, what
>>are
>> > the batch update methods we could leverage to resolve the issue? And
>>for
>> > API client, what are the batch update methods we could leverage to
>> resolve
>> > the issue?
>> >
>>
>> Do you actually have a problem where there is contention on a single
>>row?
>>
>> Use methods like
>>
>>
>>http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.htm
>>l#put(java.util.List)
>> or the batch methods listed earlier in the API.  You should set
>> autoflush to false too:
>>
>>
>>http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTableInte
>>rface.html#isAutoFlush()
>>
>> Even batching, a highly contended row might hold up inserts... but for
>> sure you actually have this problem in the first place?
>>
>> St.Ack
>>

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB