|
|
Hello guys,
I am reading the book "HBase, the definitive guide", at the beginning of chapter 3, it is mentioned in order to reduce performance impact for clients to update the same row (lock contention issues for automatic write), batch update is preferred. My questions is, for MR job, what are the batch update methods we could leverage to resolve the issue? And for API client, what are the batch update methods we could leverage to resolve the issue?
thanks in advance, Lin
-
Re: batch update question
Lin Ma 2012-09-04, 15:21
Hi Christian, I read through the link you referred. It seems HBaseHUT is exactly the solution I am looking for. Before making the technology choice decision, I want to learn a bit more about its internal design and the general idea of HBaseHUT of how throughput of write is improved. From the discussion, CP is mentioned. But I cannot find more details, appreciate if you could point me to some more detailed documents. Thanks. regards, Lin On Tue, Sep 4, 2012 at 5:28 AM, Christian Schäfer <[EMAIL PROTECTED]>wrote: > > hi, > > maybe you could be interrested in hbase hut (high update throughput) see > https://github.com/sematext/HBaseHUT> > > > ------------------------------ > Lin Ma schrieb am So., 2. Sep 2012 11:13 MESZ: > > >Hello guys, > > > >I am reading the book "HBase, the definitive guide", at the beginning of > >chapter 3, it is mentioned in order to reduce performance impact for > >clients to update the same row (lock contention issues for automatic > >write), batch update is preferred. My questions is, for MR job, what are > >the batch update methods we could leverage to resolve the issue? And for > >API client, what are the batch update methods we could leverage to resolve > >the issue? > > > >thanks in advance, > >Lin > >
-
Re: batch update question
Stack 2012-09-04, 20:00
On Sun, Sep 2, 2012 at 2:13 AM, Lin Ma <[EMAIL PROTECTED]> wrote: > Hello guys, > > I am reading the book "HBase, the definitive guide", at the beginning of > chapter 3, it is mentioned in order to reduce performance impact for > clients to update the same row (lock contention issues for automatic > write), batch update is preferred. My questions is, for MR job, what are > the batch update methods we could leverage to resolve the issue? And for > API client, what are the batch update methods we could leverage to resolve > the issue? > Do you actually have a problem where there is contention on a single row? Use methods like http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#put(java.util.List)or the batch methods listed earlier in the API. You should set autoflush to false too: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTableInterface.html#isAutoFlush()Even batching, a highly contended row might hold up inserts... but for sure you actually have this problem in the first place? St.Ack
-
Re: batch update question
Christian Schäfer 2012-09-04, 20:04
Hi Lin, checkout the slides about high update workloads and HBaseHUT at: http://blog.sematext.com/?s=hbasehutMaybe you could ask Alex Baranau about details here on the list to share it. regards Chris ________________________________ Von: Lin Ma <[EMAIL PROTECTED]> An: [EMAIL PROTECTED]; [EMAIL PROTECTED] Gesendet: 17:21 Dienstag, 4.September 2012 Betreff: Re: batch update question Hi Christian, I read through the link you referred. It seems HBaseHUT is exactly the solution I am looking for. Before making the technology choice decision, I want to learn a bit more about its internal design and the general idea of HBaseHUT of how throughput of write is improved. From the discussion, CP is mentioned. But I cannot find more details, appreciate if you could point me to some more detailed documents. Thanks. regards, Lin On Tue, Sep 4, 2012 at 5:28 AM, Christian Schäfer <[EMAIL PROTECTED]> wrote: >hi, > >maybe you could be interrested in hbase hut (high update throughput) see > https://github.com/sematext/HBaseHUT> > > >------------------------------ >Lin Ma schrieb am So., 2. Sep 2012 11:13 MESZ: > > >>Hello guys, >> >>I am reading the book "HBase, the definitive guide", at the beginning of >>chapter 3, it is mentioned in order to reduce performance impact for >>clients to update the same row (lock contention issues for automatic >>write), batch update is preferred. My questions is, for MR job, what are >>the batch update methods we could leverage to resolve the issue? And for >>API client, what are the batch update methods we could leverage to resolve >>the issue? >> >>thanks in advance, >>Lin > >
-
Re: batch update question
Lin Ma 2012-09-05, 16:04
Thank you Stack for the details directions! 1. You are right, I have not met with any real row contention issues. My purpose is understanding the issue in advance, and also from this issue to understand HBase generals better; 2. For the comments from API Url page you referred -- "If isAutoFlush< http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTableInterface.html#isAutoFlush%28%29>isfalse, the update is buffered until the internal buffer is full.", I am confused what is the buffer? Buffer at client side or buffer in region server? Is there a way to configure its size to hold until flushing? 3. Why batch could resolve contention on the same raw issue in theory, compared to non-batch operation? Besides preparation the solution in my mind in advance, I want to learn a bit about why. :-) regards, Lin On Wed, Sep 5, 2012 at 4:00 AM, Stack <[EMAIL PROTECTED]> wrote: > On Sun, Sep 2, 2012 at 2:13 AM, Lin Ma <[EMAIL PROTECTED]> wrote: > > Hello guys, > > > > I am reading the book "HBase, the definitive guide", at the beginning of > > chapter 3, it is mentioned in order to reduce performance impact for > > clients to update the same row (lock contention issues for automatic > > write), batch update is preferred. My questions is, for MR job, what are > > the batch update methods we could leverage to resolve the issue? And for > > API client, what are the batch update methods we could leverage to > resolve > > the issue? > > > > Do you actually have a problem where there is contention on a single row? > > Use methods like > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#put(java.util.List)> or the batch methods listed earlier in the API. You should set > autoflush to false too: > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTableInterface.html#isAutoFlush()> > Even batching, a highly contended row might hold up inserts... but for > sure you actually have this problem in the first place? > > St.Ack >
-
Re: batch update question
Doug Meil 2012-09-05, 16:59
Hi there, if you look in the source code for HTable there is a list of Put objects. That's the buffer, and it's a client-side buffer. On 9/5/12 12:04 PM, "Lin Ma" <[EMAIL PROTECTED]> wrote: >Thank you Stack for the details directions! > >1. You are right, I have not met with any real row contention issues. My >purpose is understanding the issue in advance, and also from this issue to >understand HBase generals better; >2. For the comments from API Url page you referred -- "If >isAutoFlush< http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client>/HTableInterface.html#isAutoFlush%28%29>is >false, the update is buffered until the internal buffer is full.", I >am >confused what is the buffer? Buffer at client side or buffer in region >server? Is there a way to configure its size to hold until flushing? >3. Why batch could resolve contention on the same raw issue in theory, >compared to non-batch operation? Besides preparation the solution in my >mind in advance, I want to learn a bit about why. :-) > >regards, >Lin > >On Wed, Sep 5, 2012 at 4:00 AM, Stack <[EMAIL PROTECTED]> wrote: > >> On Sun, Sep 2, 2012 at 2:13 AM, Lin Ma <[EMAIL PROTECTED]> wrote: >> > Hello guys, >> > >> > I am reading the book "HBase, the definitive guide", at the beginning >>of >> > chapter 3, it is mentioned in order to reduce performance impact for >> > clients to update the same row (lock contention issues for automatic >> > write), batch update is preferred. My questions is, for MR job, what >>are >> > the batch update methods we could leverage to resolve the issue? And >>for >> > API client, what are the batch update methods we could leverage to >> resolve >> > the issue? >> > >> >> Do you actually have a problem where there is contention on a single >>row? >> >> Use methods like >> >> >> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.htm>>l#put(java.util.List) >> or the batch methods listed earlier in the API. You should set >> autoflush to false too: >> >> >> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTableInte>>rface.html#isAutoFlush() >> >> Even batching, a highly contended row might hold up inserts... but for >> sure you actually have this problem in the first place? >> >> St.Ack >>
-
Re: batch update question
Doug Meil 2012-09-05, 17:01
Hi there, for more information about the hbase client, seeŠ http://hbase.apache.org/book.html#clientOn 9/5/12 12:59 PM, "Doug Meil" <[EMAIL PROTECTED]> wrote: > >Hi there, if you look in the source code for HTable there is a list of Put >objects. That's the buffer, and it's a client-side buffer. > > > > > >On 9/5/12 12:04 PM, "Lin Ma" <[EMAIL PROTECTED]> wrote: > >>Thank you Stack for the details directions! >> >>1. You are right, I have not met with any real row contention issues. My >>purpose is understanding the issue in advance, and also from this issue >>to >>understand HBase generals better; >>2. For the comments from API Url page you referred -- "If >>isAutoFlush< http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/clien>>t >>/HTableInterface.html#isAutoFlush%28%29>is >>false, the update is buffered until the internal buffer is full.", I >>am >>confused what is the buffer? Buffer at client side or buffer in region >>server? Is there a way to configure its size to hold until flushing? >>3. Why batch could resolve contention on the same raw issue in theory, >>compared to non-batch operation? Besides preparation the solution in my >>mind in advance, I want to learn a bit about why. :-) >> >>regards, >>Lin >> >>On Wed, Sep 5, 2012 at 4:00 AM, Stack <[EMAIL PROTECTED]> wrote: >> >>> On Sun, Sep 2, 2012 at 2:13 AM, Lin Ma <[EMAIL PROTECTED]> wrote: >>> > Hello guys, >>> > >>> > I am reading the book "HBase, the definitive guide", at the beginning >>>of >>> > chapter 3, it is mentioned in order to reduce performance impact for >>> > clients to update the same row (lock contention issues for automatic >>> > write), batch update is preferred. My questions is, for MR job, what >>>are >>> > the batch update methods we could leverage to resolve the issue? And >>>for >>> > API client, what are the batch update methods we could leverage to >>> resolve >>> > the issue? >>> > >>> >>> Do you actually have a problem where there is contention on a single >>>row? >>> >>> Use methods like >>> >>> >>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.ht>>>m >>>l#put(java.util.List) >>> or the batch methods listed earlier in the API. You should set >>> autoflush to false too: >>> >>> >>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTableInt>>>e >>>rface.html#isAutoFlush() >>> >>> Even batching, a highly contended row might hold up inserts... but for >>> sure you actually have this problem in the first place? >>> >>> St.Ack >>> >
-
Re: batch update question
Lin Ma 2012-09-06, 15:54
Thank you Doug, Very effective reply. :-) - why batch update could resolve contention issue on the same row? Could you elaborate a bit more or show me an example? - Batch update always have good performance compared to single update (when we measure total throughput)? regards, Lin On Thu, Sep 6, 2012 at 12:59 AM, Doug Meil <[EMAIL PROTECTED]>wrote: > > Hi there, if you look in the source code for HTable there is a list of Put > objects. That's the buffer, and it's a client-side buffer. > > > > > > On 9/5/12 12:04 PM, "Lin Ma" <[EMAIL PROTECTED]> wrote: > > >Thank you Stack for the details directions! > > > >1. You are right, I have not met with any real row contention issues. My > >purpose is understanding the issue in advance, and also from this issue to > >understand HBase generals better; > >2. For the comments from API Url page you referred -- "If > >isAutoFlush< > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client> >/HTableInterface.html#isAutoFlush%28%29>is > >false, the update is buffered until the internal buffer is full.", I > >am > >confused what is the buffer? Buffer at client side or buffer in region > >server? Is there a way to configure its size to hold until flushing? > >3. Why batch could resolve contention on the same raw issue in theory, > >compared to non-batch operation? Besides preparation the solution in my > >mind in advance, I want to learn a bit about why. :-) > > > >regards, > >Lin > > > >On Wed, Sep 5, 2012 at 4:00 AM, Stack <[EMAIL PROTECTED]> wrote: > > > >> On Sun, Sep 2, 2012 at 2:13 AM, Lin Ma <[EMAIL PROTECTED]> wrote: > >> > Hello guys, > >> > > >> > I am reading the book "HBase, the definitive guide", at the beginning > >>of > >> > chapter 3, it is mentioned in order to reduce performance impact for > >> > clients to update the same row (lock contention issues for automatic > >> > write), batch update is preferred. My questions is, for MR job, what > >>are > >> > the batch update methods we could leverage to resolve the issue? And > >>for > >> > API client, what are the batch update methods we could leverage to > >> resolve > >> > the issue? > >> > > >> > >> Do you actually have a problem where there is contention on a single > >>row? > >> > >> Use methods like > >> > >> > >> > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.htm> >>l#put(java.util.List) > >> or the batch methods listed earlier in the API. You should set > >> autoflush to false too: > >> > >> > >> > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTableInte> >>rface.html#isAutoFlush() > >> > >> Even batching, a highly contended row might hold up inserts... but for > >> sure you actually have this problem in the first place? > >> > >> St.Ack > >> > > >
-
Re: batch update question
Doug Meil 2012-09-06, 18:26
For the 2nd part of the question, if you have 10 Puts it's more efficient to send a single RS message with 10 Puts than send 10 RS messages with 1 Put apiece. There are 2 words to be careful with, and those are "always" and "never", because there is an exception: if you are using the client writeBuffer and each of those 10 Puts are going to a different RegionServer, then you haven't really gained much. To answer the next question of how you know where the Puts are going, see this method… http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getRegionLocation%28byte[],%20boolean%29Because the Hbase client talks directly to each RS, it has to know the region boundaries. From: Lin Ma <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Date: Thursday, September 6, 2012 11:54 AM To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>, Doug Meil <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Cc: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Subject: Re: batch update question Thank you Doug, Very effective reply. :-) - why batch update could resolve contention issue on the same row? Could you elaborate a bit more or show me an example? - Batch update always have good performance compared to single update (when we measure total throughput)? regards, Lin On Thu, Sep 6, 2012 at 12:59 AM, Doug Meil <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Hi there, if you look in the source code for HTable there is a list of Put objects. That's the buffer, and it's a client-side buffer. On 9/5/12 12:04 PM, "Lin Ma" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: >Thank you Stack for the details directions! > >1. You are right, I have not met with any real row contention issues. My >purpose is understanding the issue in advance, and also from this issue to >understand HBase generals better; >2. For the comments from API Url page you referred -- "If >isAutoFlush< http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client>/HTableInterface.html#isAutoFlush%28%29>is >false, the update is buffered until the internal buffer is full.", I >am >confused what is the buffer? Buffer at client side or buffer in region >server? Is there a way to configure its size to hold until flushing? >3. Why batch could resolve contention on the same raw issue in theory, >compared to non-batch operation? Besides preparation the solution in my >mind in advance, I want to learn a bit about why. :-) > >regards, >Lin > >On Wed, Sep 5, 2012 at 4:00 AM, Stack <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: > >> On Sun, Sep 2, 2012 at 2:13 AM, Lin Ma <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: >> > Hello guys, >> > >> > I am reading the book "HBase, the definitive guide", at the beginning >>of >> > chapter 3, it is mentioned in order to reduce performance impact for >> > clients to update the same row (lock contention issues for automatic >> > write), batch update is preferred. My questions is, for MR job, what >>are >> > the batch update methods we could leverage to resolve the issue? And >>for >> > API client, what are the batch update methods we could leverage to >> resolve >> > the issue? >> > >> >> Do you actually have a problem where there is contention on a single >>row? >> >> Use methods like >> >> >> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.htm>>l#put(java.util.List) >> or the batch methods listed earlier in the API. You should set >> autoflush to false too: >> >> >> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTableInte>>rface.html#isAutoFlush() >> >> Even batching, a highly contended row might hold up inserts... but for >> sure you actually have this problem in the first place? >> >> St.Ack >>
-
Re: batch update question
Lin Ma 2012-09-07, 15:39
Thank you Doug. I still have one confusion left. My original question is, why batch update could resolve the performance (or make improvement) issue caused by same row update contention by multiple clients. Do you have any ideas or comments? regards, Lin On Fri, Sep 7, 2012 at 2:26 AM, Doug Meil <[EMAIL PROTECTED]>wrote: > > For the 2nd part of the question, if you have 10 Puts it's more > efficient to send a single RS message with 10 Puts than send 10 RS messages > with 1 Put apiece. There are 2 words to be careful with, and those are > "always" and "never", because there is an exception: if you are using the > client writeBuffer and each of those 10 Puts are going to a different > RegionServer, then you haven't really gained much. > > To answer the next question of how you know where the Puts are going, > see this method… > > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getRegionLocation%28byte[],%20boolean%29> > Because the Hbase client talks directly to each RS, it has to know the > region boundaries. > > > > From: Lin Ma <[EMAIL PROTECTED]> > Date: Thursday, September 6, 2012 11:54 AM > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, Doug Meil < > [EMAIL PROTECTED]> > Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Subject: Re: batch update question > > Thank you Doug, > > Very effective reply. :-) > > - why batch update could resolve contention issue on the same row? Could > you elaborate a bit more or show me an example? > - Batch update always have good performance compared to single update > (when we measure total throughput)? > > regards, > Lin > > On Thu, Sep 6, 2012 at 12:59 AM, Doug Meil <[EMAIL PROTECTED]>wrote: > >> >> Hi there, if you look in the source code for HTable there is a list of Put >> objects. That's the buffer, and it's a client-side buffer. >> >> >> >> >> >> On 9/5/12 12:04 PM, "Lin Ma" <[EMAIL PROTECTED]> wrote: >> >> >Thank you Stack for the details directions! >> > >> >1. You are right, I have not met with any real row contention issues. My >> >purpose is understanding the issue in advance, and also from this issue >> to >> >understand HBase generals better; >> >2. For the comments from API Url page you referred -- "If >> >isAutoFlush< >> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client>> >/HTableInterface.html#isAutoFlush%28%29>is >> >false, the update is buffered until the internal buffer is full.", I >> >am >> >confused what is the buffer? Buffer at client side or buffer in region >> >server? Is there a way to configure its size to hold until flushing? >> >3. Why batch could resolve contention on the same raw issue in theory, >> >compared to non-batch operation? Besides preparation the solution in my >> >mind in advance, I want to learn a bit about why. :-) >> > >> >regards, >> >Lin >> > >> >On Wed, Sep 5, 2012 at 4:00 AM, Stack <[EMAIL PROTECTED]> wrote: >> > >> >> On Sun, Sep 2, 2012 at 2:13 AM, Lin Ma <[EMAIL PROTECTED]> wrote: >> >> > Hello guys, >> >> > >> >> > I am reading the book "HBase, the definitive guide", at the beginning >> >>of >> >> > chapter 3, it is mentioned in order to reduce performance impact for >> >> > clients to update the same row (lock contention issues for automatic >> >> > write), batch update is preferred. My questions is, for MR job, what >> >>are >> >> > the batch update methods we could leverage to resolve the issue? And >> >>for >> >> > API client, what are the batch update methods we could leverage to >> >> resolve >> >> > the issue? >> >> > >> >> >> >> Do you actually have a problem where there is contention on a single >> >>row? >> >> >> >> Use methods like >> >> >> >> >> >> >> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.htm>> >>l#put(java.util.List) >> >> or the batch methods listed earlier in the API. You should set >> >> autoflush to false too: >> >> >> >> >> >> >> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTableInte
|
|