|
Alex Baranau
2011-06-27, 20:08
Ted Yu
2011-06-27, 20:16
Alex Baranau
2011-06-27, 20:33
Stack
2011-06-27, 20:40
Alex Baranau
2011-06-27, 20:56
Stack
2011-06-27, 21:21
Joey Echeverria
2011-06-27, 22:23
Gary Helmling
2011-06-28, 16:17
Doug Meil
2011-06-28, 16:40
Alex Baranau
2011-06-28, 17:07
Doug Meil
2011-06-28, 19:41
Alex Baranau
2011-06-29, 06:17
Doug Meil
2011-06-29, 13:44
Alex Baranau
2011-06-29, 14:57
Andrew Purtell
2011-06-28, 23:17
Todd Lipcon
2011-06-28, 15:45
|
-
Retry HTable.put() on client-side to handle temp connectivity problemAlex Baranau 2011-06-27, 20:08
Hello,
Just wanted to confirm that I'm doing things in a proper way here. How about this code to handle the temp cluster connectivity problems (or cluster down time) on client-side? + // HTable.put() will fail with exception if connection to cluster is temporarily broken or + // cluster is temporarily down. To be sure data is written we retry writing. + boolean dataWritten = false; + do { + try { + table.put(p); + dataWritten = true; + } catch (IOException ioe) { // indicates cluster connectivity problem (also thrown when cluster is down) + LOG.error("Writing data to HBase failed, will try again in " + RETRY_INTERVAL_ON_WRITE_FAIL + " sec", ioe); + Thread.currentThread().wait(RETRY_INTERVAL_ON_WRITE_FAIL * 1000); + } + } while (!dataWritten); Thank you in advance, Alex Baranau ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase +
Alex Baranau 2011-06-27, 20:08
-
Re: Retry HTable.put() on client-side to handle temp connectivity problemTed Yu 2011-06-27, 20:16
This would retry indefinitely, right ?
Normally maximum retry duration would govern how long the retry is attempted. On Mon, Jun 27, 2011 at 1:08 PM, Alex Baranau <[EMAIL PROTECTED]>wrote: > Hello, > > Just wanted to confirm that I'm doing things in a proper way here. How > about > this code to handle the temp cluster connectivity problems (or cluster down > time) on client-side? > > + // HTable.put() will fail with exception if connection to cluster is > temporarily broken or > + // cluster is temporarily down. To be sure data is written we retry > writing. > + boolean dataWritten = false; > + do { > + try { > + table.put(p); > + dataWritten = true; > + } catch (IOException ioe) { // indicates cluster connectivity > problem > (also thrown when cluster is down) > + LOG.error("Writing data to HBase failed, will try again in " + > RETRY_INTERVAL_ON_WRITE_FAIL + " sec", ioe); > + Thread.currentThread().wait(RETRY_INTERVAL_ON_WRITE_FAIL * 1000); > + } > + } while (!dataWritten); > > Thank you in advance, > Alex Baranau > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase > +
Ted Yu 2011-06-27, 20:16
-
Re: Retry HTable.put() on client-side to handle temp connectivity problemAlex Baranau 2011-06-27, 20:33
Yes, that is what intended, I think. To make the whole picture clear, here's
the context: * there's a Flume's HBase sink (read: HBase client) which writes data from Flume "pipe" (read: some event-based messages source) to HTable; * when HBase is down for some time (with default HBase configuration on Flume's sink side) HTable.put throws exception and client exits (it usually takes ~10 min to fail); * Flume is smart enough to accumulate data to be written reliably if sink behaves badly (not writing for some time, pauses, etc.), so it would be great if the sink tries to write data until HBase is up again, BUT: * but here, as we have complete "failure" of sink process (thread needs to be restarted) the data never reaches HTable even after HBase cluster is brought up again. So you suggest instead of this extra construction around HTable.put to use configuration properties "hbase.client.pause" and "hbase.client.retries.number"? I.e. make retries attempts to be (reasonably) close to "perform forever". Is that what you meant? Thank you, Alex Baranau ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase On Mon, Jun 27, 2011 at 11:16 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > This would retry indefinitely, right ? > Normally maximum retry duration would govern how long the retry is > attempted. > > On Mon, Jun 27, 2011 at 1:08 PM, Alex Baranau <[EMAIL PROTECTED] > >wrote: > > > Hello, > > > > Just wanted to confirm that I'm doing things in a proper way here. How > > about > > this code to handle the temp cluster connectivity problems (or cluster > down > > time) on client-side? > > > > + // HTable.put() will fail with exception if connection to cluster is > > temporarily broken or > > + // cluster is temporarily down. To be sure data is written we retry > > writing. > > + boolean dataWritten = false; > > + do { > > + try { > > + table.put(p); > > + dataWritten = true; > > + } catch (IOException ioe) { // indicates cluster connectivity > > problem > > (also thrown when cluster is down) > > + LOG.error("Writing data to HBase failed, will try again in " + > > RETRY_INTERVAL_ON_WRITE_FAIL + " sec", ioe); > > + Thread.currentThread().wait(RETRY_INTERVAL_ON_WRITE_FAIL * > 1000); > > + } > > + } while (!dataWritten); > > > > Thank you in advance, > > Alex Baranau > > ---- > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - > HBase > > > +
Alex Baranau 2011-06-27, 20:33
-
Re: Retry HTable.put() on client-side to handle temp connectivity problemStack 2011-06-27, 20:40
Either should work Alex. Your version will go "for ever". Have you
tried yanking hbase out from under the client to see if it reconnects? Good on you, St.Ack On Mon, Jun 27, 2011 at 1:33 PM, Alex Baranau <[EMAIL PROTECTED]> wrote: > Yes, that is what intended, I think. To make the whole picture clear, here's > the context: > > * there's a Flume's HBase sink (read: HBase client) which writes data from > Flume "pipe" (read: some event-based messages source) to HTable; > * when HBase is down for some time (with default HBase configuration on > Flume's sink side) HTable.put throws exception and client exits (it usually > takes ~10 min to fail); > * Flume is smart enough to accumulate data to be written reliably if sink > behaves badly (not writing for some time, pauses, etc.), so it would be > great if the sink tries to write data until HBase is up again, BUT: > * but here, as we have complete "failure" of sink process (thread needs to > be restarted) the data never reaches HTable even after HBase cluster is > brought up again. > > So you suggest instead of this extra construction around HTable.put to use > configuration properties "hbase.client.pause" and > "hbase.client.retries.number"? I.e. make retries attempts to be (reasonably) > close to "perform forever". Is that what you meant? > > Thank you, > Alex Baranau > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase > > On Mon, Jun 27, 2011 at 11:16 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > >> This would retry indefinitely, right ? >> Normally maximum retry duration would govern how long the retry is >> attempted. >> >> On Mon, Jun 27, 2011 at 1:08 PM, Alex Baranau <[EMAIL PROTECTED] >> >wrote: >> >> > Hello, >> > >> > Just wanted to confirm that I'm doing things in a proper way here. How >> > about >> > this code to handle the temp cluster connectivity problems (or cluster >> down >> > time) on client-side? >> > >> > + // HTable.put() will fail with exception if connection to cluster is >> > temporarily broken or >> > + // cluster is temporarily down. To be sure data is written we retry >> > writing. >> > + boolean dataWritten = false; >> > + do { >> > + try { >> > + table.put(p); >> > + dataWritten = true; >> > + } catch (IOException ioe) { // indicates cluster connectivity >> > problem >> > (also thrown when cluster is down) >> > + LOG.error("Writing data to HBase failed, will try again in " + >> > RETRY_INTERVAL_ON_WRITE_FAIL + " sec", ioe); >> > + Thread.currentThread().wait(RETRY_INTERVAL_ON_WRITE_FAIL * >> 1000); >> > + } >> > + } while (!dataWritten); >> > >> > Thank you in advance, >> > Alex Baranau >> > ---- >> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - >> HBase >> > >> > +
Stack 2011-06-27, 20:40
-
Re: Retry HTable.put() on client-side to handle temp connectivity problemAlex Baranau 2011-06-27, 20:56
The code I pasted works for me: it reconnects successfully. Just thought it
might be not the best way to do it.. I realized that by using HBase configuration properties we could just say that it's up to user to configure HBase client (created by Flume) properly (e.g. by adding hbase-site.xml with settings to classpath). On the other hand, it looks to me that users of HBase sinks will *always* want it to retry writing to HBase until it works out. But default configuration works not this way: sinks stops when HBase is temporarily down or inaccessible. Hence it makes using the sink more complicated (because default configuration sucks), which I'd like to avoid here by adding the code above. Ideally the default configuration should work the best way for general-purpose case. I understood what are the ways to implement/configure such behavior. I think we should discuss what is the best default behavior and do we need to allow user override it on Flume ML (or directly at https://issues.cloudera.org/browse/FLUME-685). Thank you guys, Alex Baranau ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase On Mon, Jun 27, 2011 at 11:40 PM, Stack <[EMAIL PROTECTED]> wrote: > Either should work Alex. Your version will go "for ever". Have you > tried yanking hbase out from under the client to see if it reconnects? > > Good on you, > St.Ack > > On Mon, Jun 27, 2011 at 1:33 PM, Alex Baranau <[EMAIL PROTECTED]> > wrote: > > Yes, that is what intended, I think. To make the whole picture clear, > here's > > the context: > > > > * there's a Flume's HBase sink (read: HBase client) which writes data > from > > Flume "pipe" (read: some event-based messages source) to HTable; > > * when HBase is down for some time (with default HBase configuration on > > Flume's sink side) HTable.put throws exception and client exits (it > usually > > takes ~10 min to fail); > > * Flume is smart enough to accumulate data to be written reliably if sink > > behaves badly (not writing for some time, pauses, etc.), so it would be > > great if the sink tries to write data until HBase is up again, BUT: > > * but here, as we have complete "failure" of sink process (thread needs > to > > be restarted) the data never reaches HTable even after HBase cluster is > > brought up again. > > > > So you suggest instead of this extra construction around HTable.put to > use > > configuration properties "hbase.client.pause" and > > "hbase.client.retries.number"? I.e. make retries attempts to be > (reasonably) > > close to "perform forever". Is that what you meant? > > > > Thank you, > > Alex Baranau > > ---- > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - > HBase > > > > On Mon, Jun 27, 2011 at 11:16 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > >> This would retry indefinitely, right ? > >> Normally maximum retry duration would govern how long the retry is > >> attempted. > >> > >> On Mon, Jun 27, 2011 at 1:08 PM, Alex Baranau <[EMAIL PROTECTED] > >> >wrote: > >> > >> > Hello, > >> > > >> > Just wanted to confirm that I'm doing things in a proper way here. How > >> > about > >> > this code to handle the temp cluster connectivity problems (or cluster > >> down > >> > time) on client-side? > >> > > >> > + // HTable.put() will fail with exception if connection to cluster > is > >> > temporarily broken or > >> > + // cluster is temporarily down. To be sure data is written we > retry > >> > writing. > >> > + boolean dataWritten = false; > >> > + do { > >> > + try { > >> > + table.put(p); > >> > + dataWritten = true; > >> > + } catch (IOException ioe) { // indicates cluster connectivity > >> > problem > >> > (also thrown when cluster is down) > >> > + LOG.error("Writing data to HBase failed, will try again in " > + > >> > RETRY_INTERVAL_ON_WRITE_FAIL + " sec", ioe); > >> > + Thread.currentThread().wait(RETRY_INTERVAL_ON_WRITE_FAIL * > >> 1000); > >> > + } > >> > + } while (!dataWritten); +
Alex Baranau 2011-06-27, 20:56
-
Re: Retry HTable.put() on client-side to handle temp connectivity problemStack 2011-06-27, 21:21
I'd be fine with changing the default in hbase so clients just keep
trying. What do others think? St.Ack On Mon, Jun 27, 2011 at 1:56 PM, Alex Baranau <[EMAIL PROTECTED]> wrote: > The code I pasted works for me: it reconnects successfully. Just thought it > might be not the best way to do it.. I realized that by using HBase > configuration properties we could just say that it's up to user to configure > HBase client (created by Flume) properly (e.g. by adding hbase-site.xml with > settings to classpath). On the other hand, it looks to me that users of > HBase sinks will *always* want it to retry writing to HBase until it works > out. But default configuration works not this way: sinks stops when HBase is > temporarily down or inaccessible. Hence it makes using the sink more > complicated (because default configuration sucks), which I'd like to avoid > here by adding the code above. Ideally the default configuration should work > the best way for general-purpose case. > > I understood what are the ways to implement/configure such behavior. I think > we should discuss what is the best default behavior and do we need to allow > user override it on Flume ML (or directly at > https://issues.cloudera.org/browse/FLUME-685). > > Thank you guys, > > Alex Baranau > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase > > > On Mon, Jun 27, 2011 at 11:40 PM, Stack <[EMAIL PROTECTED]> wrote: > >> Either should work Alex. Your version will go "for ever". Have you >> tried yanking hbase out from under the client to see if it reconnects? >> >> Good on you, >> St.Ack >> >> On Mon, Jun 27, 2011 at 1:33 PM, Alex Baranau <[EMAIL PROTECTED]> >> wrote: >> > Yes, that is what intended, I think. To make the whole picture clear, >> here's >> > the context: >> > >> > * there's a Flume's HBase sink (read: HBase client) which writes data >> from >> > Flume "pipe" (read: some event-based messages source) to HTable; >> > * when HBase is down for some time (with default HBase configuration on >> > Flume's sink side) HTable.put throws exception and client exits (it >> usually >> > takes ~10 min to fail); >> > * Flume is smart enough to accumulate data to be written reliably if sink >> > behaves badly (not writing for some time, pauses, etc.), so it would be >> > great if the sink tries to write data until HBase is up again, BUT: >> > * but here, as we have complete "failure" of sink process (thread needs >> to >> > be restarted) the data never reaches HTable even after HBase cluster is >> > brought up again. >> > >> > So you suggest instead of this extra construction around HTable.put to >> use >> > configuration properties "hbase.client.pause" and >> > "hbase.client.retries.number"? I.e. make retries attempts to be >> (reasonably) >> > close to "perform forever". Is that what you meant? >> > >> > Thank you, >> > Alex Baranau >> > ---- >> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - >> HBase >> > >> > On Mon, Jun 27, 2011 at 11:16 PM, Ted Yu <[EMAIL PROTECTED]> wrote: >> > >> >> This would retry indefinitely, right ? >> >> Normally maximum retry duration would govern how long the retry is >> >> attempted. >> >> >> >> On Mon, Jun 27, 2011 at 1:08 PM, Alex Baranau <[EMAIL PROTECTED] >> >> >wrote: >> >> >> >> > Hello, >> >> > >> >> > Just wanted to confirm that I'm doing things in a proper way here. How >> >> > about >> >> > this code to handle the temp cluster connectivity problems (or cluster >> >> down >> >> > time) on client-side? >> >> > >> >> > + // HTable.put() will fail with exception if connection to cluster >> is >> >> > temporarily broken or >> >> > + // cluster is temporarily down. To be sure data is written we >> retry >> >> > writing. >> >> > + boolean dataWritten = false; >> >> > + do { >> >> > + try { >> >> > + table.put(p); >> >> > + dataWritten = true; >> >> > + } catch (IOException ioe) { // indicates cluster connectivity >> >> > problem +
Stack 2011-06-27, 21:21
-
Re: Retry HTable.put() on client-side to handle temp connectivity problemJoey Echeverria 2011-06-27, 22:23
If I could override the default, I'd be a hesitant +1. I'd rather see
the default be something like retry 10 times, then throw an error. With one option being infinite retries. -Joey On Mon, Jun 27, 2011 at 2:21 PM, Stack <[EMAIL PROTECTED]> wrote: > I'd be fine with changing the default in hbase so clients just keep > trying. What do others think? > St.Ack > > On Mon, Jun 27, 2011 at 1:56 PM, Alex Baranau <[EMAIL PROTECTED]> wrote: >> The code I pasted works for me: it reconnects successfully. Just thought it >> might be not the best way to do it.. I realized that by using HBase >> configuration properties we could just say that it's up to user to configure >> HBase client (created by Flume) properly (e.g. by adding hbase-site.xml with >> settings to classpath). On the other hand, it looks to me that users of >> HBase sinks will *always* want it to retry writing to HBase until it works >> out. But default configuration works not this way: sinks stops when HBase is >> temporarily down or inaccessible. Hence it makes using the sink more >> complicated (because default configuration sucks), which I'd like to avoid >> here by adding the code above. Ideally the default configuration should work >> the best way for general-purpose case. >> >> I understood what are the ways to implement/configure such behavior. I think >> we should discuss what is the best default behavior and do we need to allow >> user override it on Flume ML (or directly at >> https://issues.cloudera.org/browse/FLUME-685). >> >> Thank you guys, >> >> Alex Baranau >> ---- >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase >> >> >> On Mon, Jun 27, 2011 at 11:40 PM, Stack <[EMAIL PROTECTED]> wrote: >> >>> Either should work Alex. Your version will go "for ever". Have you >>> tried yanking hbase out from under the client to see if it reconnects? >>> >>> Good on you, >>> St.Ack >>> >>> On Mon, Jun 27, 2011 at 1:33 PM, Alex Baranau <[EMAIL PROTECTED]> >>> wrote: >>> > Yes, that is what intended, I think. To make the whole picture clear, >>> here's >>> > the context: >>> > >>> > * there's a Flume's HBase sink (read: HBase client) which writes data >>> from >>> > Flume "pipe" (read: some event-based messages source) to HTable; >>> > * when HBase is down for some time (with default HBase configuration on >>> > Flume's sink side) HTable.put throws exception and client exits (it >>> usually >>> > takes ~10 min to fail); >>> > * Flume is smart enough to accumulate data to be written reliably if sink >>> > behaves badly (not writing for some time, pauses, etc.), so it would be >>> > great if the sink tries to write data until HBase is up again, BUT: >>> > * but here, as we have complete "failure" of sink process (thread needs >>> to >>> > be restarted) the data never reaches HTable even after HBase cluster is >>> > brought up again. >>> > >>> > So you suggest instead of this extra construction around HTable.put to >>> use >>> > configuration properties "hbase.client.pause" and >>> > "hbase.client.retries.number"? I.e. make retries attempts to be >>> (reasonably) >>> > close to "perform forever". Is that what you meant? >>> > >>> > Thank you, >>> > Alex Baranau >>> > ---- >>> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - >>> HBase >>> > >>> > On Mon, Jun 27, 2011 at 11:16 PM, Ted Yu <[EMAIL PROTECTED]> wrote: >>> > >>> >> This would retry indefinitely, right ? >>> >> Normally maximum retry duration would govern how long the retry is >>> >> attempted. >>> >> >>> >> On Mon, Jun 27, 2011 at 1:08 PM, Alex Baranau <[EMAIL PROTECTED] >>> >> >wrote: >>> >> >>> >> > Hello, >>> >> > >>> >> > Just wanted to confirm that I'm doing things in a proper way here. How >>> >> > about >>> >> > this code to handle the temp cluster connectivity problems (or cluster >>> >> down >>> >> > time) on client-side? >>> >> > >>> >> > + // HTable.put() will fail with exception if connection to cluster >>> is >>> >> > temporarily broken or Joseph Echeverria Cloudera, Inc. 443.305.9434 +
Joey Echeverria 2011-06-27, 22:23
-
Re: Retry HTable.put() on client-side to handle temp connectivity problemGary Helmling 2011-06-28, 16:17
I'd also be wary of changing the default to retry forever. This might be
hard to differentiate from a hang or deadlock for new users and seems to violate "least surprise". In many cases it's preferable to have some kind of predictable failure as well. So I think this would appear to be a regression in behavior. If you're serving say web site data from hbase, you may prefer an occasional error or timeout rather than having page loading hang forever. I'm all for making "retry forever" a configurable option, but do we need any new knobs here? --gh On Mon, Jun 27, 2011 at 3:23 PM, Joey Echeverria <[EMAIL PROTECTED]> wrote: > If I could override the default, I'd be a hesitant +1. I'd rather see > the default be something like retry 10 times, then throw an error. > With one option being infinite retries. > > -Joey > > On Mon, Jun 27, 2011 at 2:21 PM, Stack <[EMAIL PROTECTED]> wrote: > > I'd be fine with changing the default in hbase so clients just keep > > trying. What do others think? > > St.Ack > > > > On Mon, Jun 27, 2011 at 1:56 PM, Alex Baranau <[EMAIL PROTECTED]> > wrote: > >> The code I pasted works for me: it reconnects successfully. Just thought > it > >> might be not the best way to do it.. I realized that by using HBase > >> configuration properties we could just say that it's up to user to > configure > >> HBase client (created by Flume) properly (e.g. by adding hbase-site.xml > with > >> settings to classpath). On the other hand, it looks to me that users of > >> HBase sinks will *always* want it to retry writing to HBase until it > works > >> out. But default configuration works not this way: sinks stops when > HBase is > >> temporarily down or inaccessible. Hence it makes using the sink more > >> complicated (because default configuration sucks), which I'd like to > avoid > >> here by adding the code above. Ideally the default configuration should > work > >> the best way for general-purpose case. > >> > >> I understood what are the ways to implement/configure such behavior. I > think > >> we should discuss what is the best default behavior and do we need to > allow > >> user override it on Flume ML (or directly at > >> https://issues.cloudera.org/browse/FLUME-685). > >> > >> Thank you guys, > >> > >> Alex Baranau > >> ---- > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - > HBase > >> > >> > >> On Mon, Jun 27, 2011 at 11:40 PM, Stack <[EMAIL PROTECTED]> wrote: > >> > >>> Either should work Alex. Your version will go "for ever". Have you > >>> tried yanking hbase out from under the client to see if it reconnects? > >>> > >>> Good on you, > >>> St.Ack > >>> > >>> On Mon, Jun 27, 2011 at 1:33 PM, Alex Baranau < > [EMAIL PROTECTED]> > >>> wrote: > >>> > Yes, that is what intended, I think. To make the whole picture clear, > >>> here's > >>> > the context: > >>> > > >>> > * there's a Flume's HBase sink (read: HBase client) which writes data > >>> from > >>> > Flume "pipe" (read: some event-based messages source) to HTable; > >>> > * when HBase is down for some time (with default HBase configuration > on > >>> > Flume's sink side) HTable.put throws exception and client exits (it > >>> usually > >>> > takes ~10 min to fail); > >>> > * Flume is smart enough to accumulate data to be written reliably if > sink > >>> > behaves badly (not writing for some time, pauses, etc.), so it would > be > >>> > great if the sink tries to write data until HBase is up again, BUT: > >>> > * but here, as we have complete "failure" of sink process (thread > needs > >>> to > >>> > be restarted) the data never reaches HTable even after HBase cluster > is > >>> > brought up again. > >>> > > >>> > So you suggest instead of this extra construction around HTable.put > to > >>> use > >>> > configuration properties "hbase.client.pause" and > >>> > "hbase.client.retries.number"? I.e. make retries attempts to be > >>> (reasonably) > >>> > close to "perform forever". Is that what you meant? > >>> > > >>> > Thank you, +
Gary Helmling 2011-06-28, 16:17
-
RE: Retry HTable.put() on client-side to handle temp connectivity problemDoug Meil 2011-06-28, 16:40
I agree with what Todd & Gary said. I don't like retry-forever, especially as a default option in HBase.
-----Original Message----- From: Gary Helmling [mailto:[EMAIL PROTECTED]] Sent: Tuesday, June 28, 2011 12:18 PM To: [EMAIL PROTECTED] Cc: Jonathan Hsieh Subject: Re: Retry HTable.put() on client-side to handle temp connectivity problem I'd also be wary of changing the default to retry forever. This might be hard to differentiate from a hang or deadlock for new users and seems to violate "least surprise". In many cases it's preferable to have some kind of predictable failure as well. So I think this would appear to be a regression in behavior. If you're serving say web site data from hbase, you may prefer an occasional error or timeout rather than having page loading hang forever. I'm all for making "retry forever" a configurable option, but do we need any new knobs here? --gh On Mon, Jun 27, 2011 at 3:23 PM, Joey Echeverria <[EMAIL PROTECTED]> wrote: > If I could override the default, I'd be a hesitant +1. I'd rather see > the default be something like retry 10 times, then throw an error. > With one option being infinite retries. > > -Joey > > On Mon, Jun 27, 2011 at 2:21 PM, Stack <[EMAIL PROTECTED]> wrote: > > I'd be fine with changing the default in hbase so clients just keep > > trying. What do others think? > > St.Ack > > > > On Mon, Jun 27, 2011 at 1:56 PM, Alex Baranau > > <[EMAIL PROTECTED]> > wrote: > >> The code I pasted works for me: it reconnects successfully. Just > >> thought > it > >> might be not the best way to do it.. I realized that by using HBase > >> configuration properties we could just say that it's up to user to > configure > >> HBase client (created by Flume) properly (e.g. by adding > >> hbase-site.xml > with > >> settings to classpath). On the other hand, it looks to me that > >> users of HBase sinks will *always* want it to retry writing to > >> HBase until it > works > >> out. But default configuration works not this way: sinks stops when > HBase is > >> temporarily down or inaccessible. Hence it makes using the sink > >> more complicated (because default configuration sucks), which I'd > >> like to > avoid > >> here by adding the code above. Ideally the default configuration > >> should > work > >> the best way for general-purpose case. > >> > >> I understood what are the ways to implement/configure such > >> behavior. I > think > >> we should discuss what is the best default behavior and do we need > >> to > allow > >> user override it on Flume ML (or directly at > >> https://issues.cloudera.org/browse/FLUME-685). > >> > >> Thank you guys, > >> > >> Alex Baranau > >> ---- > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop > >> - > HBase > >> > >> > >> On Mon, Jun 27, 2011 at 11:40 PM, Stack <[EMAIL PROTECTED]> wrote: > >> > >>> Either should work Alex. Your version will go "for ever". Have > >>> you tried yanking hbase out from under the client to see if it reconnects? > >>> > >>> Good on you, > >>> St.Ack > >>> > >>> On Mon, Jun 27, 2011 at 1:33 PM, Alex Baranau < > [EMAIL PROTECTED]> > >>> wrote: > >>> > Yes, that is what intended, I think. To make the whole picture > >>> > clear, > >>> here's > >>> > the context: > >>> > > >>> > * there's a Flume's HBase sink (read: HBase client) which writes > >>> > data > >>> from > >>> > Flume "pipe" (read: some event-based messages source) to HTable; > >>> > * when HBase is down for some time (with default HBase > >>> > configuration > on > >>> > Flume's sink side) HTable.put throws exception and client exits > >>> > (it > >>> usually > >>> > takes ~10 min to fail); > >>> > * Flume is smart enough to accumulate data to be written > >>> > reliably if > sink > >>> > behaves badly (not writing for some time, pauses, etc.), so it > >>> > would > be > >>> > great if the sink tries to write data until HBase is up again, BUT: > >>> > * but here, as we have complete "failure" of sink process +
Doug Meil 2011-06-28, 16:40
-
Re: Retry HTable.put() on client-side to handle temp connectivity problemAlex Baranau 2011-06-28, 17:07
> if the sink "dies" for some reason, then it should
> push that back to the upstream parts of the flume dataflow, and have them > buffer data on local disk. True. But this seem to be a separate issue: https://issues.cloudera.org/browse/FLUME-390. Alex Baranau ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase On Tue, Jun 28, 2011 at 7:40 PM, Doug Meil <[EMAIL PROTECTED]>wrote: > I agree with what Todd & Gary said. I don't like retry-forever, > especially as a default option in HBase. > > > -----Original Message----- > From: Gary Helmling [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, June 28, 2011 12:18 PM > To: [EMAIL PROTECTED] > Cc: Jonathan Hsieh > Subject: Re: Retry HTable.put() on client-side to handle temp connectivity > problem > > I'd also be wary of changing the default to retry forever. This might be > hard to differentiate from a hang or deadlock for new users and seems to > violate "least surprise". > > In many cases it's preferable to have some kind of predictable failure as > well. So I think this would appear to be a regression in behavior. If > you're serving say web site data from hbase, you may prefer an occasional > error or timeout rather than having page loading hang forever. > > I'm all for making "retry forever" a configurable option, but do we need > any new knobs here? > > --gh > > > On Mon, Jun 27, 2011 at 3:23 PM, Joey Echeverria <[EMAIL PROTECTED]> > wrote: > > > If I could override the default, I'd be a hesitant +1. I'd rather see > > the default be something like retry 10 times, then throw an error. > > With one option being infinite retries. > > > > -Joey > > > > On Mon, Jun 27, 2011 at 2:21 PM, Stack <[EMAIL PROTECTED]> wrote: > > > I'd be fine with changing the default in hbase so clients just keep > > > trying. What do others think? > > > St.Ack > > > > > > On Mon, Jun 27, 2011 at 1:56 PM, Alex Baranau > > > <[EMAIL PROTECTED]> > > wrote: > > >> The code I pasted works for me: it reconnects successfully. Just > > >> thought > > it > > >> might be not the best way to do it.. I realized that by using HBase > > >> configuration properties we could just say that it's up to user to > > configure > > >> HBase client (created by Flume) properly (e.g. by adding > > >> hbase-site.xml > > with > > >> settings to classpath). On the other hand, it looks to me that > > >> users of HBase sinks will *always* want it to retry writing to > > >> HBase until it > > works > > >> out. But default configuration works not this way: sinks stops when > > HBase is > > >> temporarily down or inaccessible. Hence it makes using the sink > > >> more complicated (because default configuration sucks), which I'd > > >> like to > > avoid > > >> here by adding the code above. Ideally the default configuration > > >> should > > work > > >> the best way for general-purpose case. > > >> > > >> I understood what are the ways to implement/configure such > > >> behavior. I > > think > > >> we should discuss what is the best default behavior and do we need > > >> to > > allow > > >> user override it on Flume ML (or directly at > > >> https://issues.cloudera.org/browse/FLUME-685). > > >> > > >> Thank you guys, > > >> > > >> Alex Baranau > > >> ---- > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop > > >> - > > HBase > > >> > > >> > > >> On Mon, Jun 27, 2011 at 11:40 PM, Stack <[EMAIL PROTECTED]> wrote: > > >> > > >>> Either should work Alex. Your version will go "for ever". Have > > >>> you tried yanking hbase out from under the client to see if it > reconnects? > > >>> > > >>> Good on you, > > >>> St.Ack > > >>> > > >>> On Mon, Jun 27, 2011 at 1:33 PM, Alex Baranau < > > [EMAIL PROTECTED]> > > >>> wrote: > > >>> > Yes, that is what intended, I think. To make the whole picture > > >>> > clear, > > >>> here's > > >>> > the context: > > >>> > > > >>> > * there's a Flume's HBase sink (read: HBase client) which writes > > >>> > data > > >>> from > +
Alex Baranau 2011-06-28, 17:07
-
Re: Retry HTable.put() on client-side to handle temp connectivity problemDoug Meil 2011-06-28, 19:41
But if Flume used the htable 'batch' method instead of 'put'... http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html# batch%28java.util.List%29 .. doesn't it sidestep this issue? Because instead of being unsure what was in the write-buffer and what wasn't, the caller knows exactly what was sent and whether it was sent without error. On 6/28/11 1:07 PM, "Alex Baranau" <[EMAIL PROTECTED]> wrote: >> if the sink "dies" for some reason, then it should >> push that back to the upstream parts of the flume dataflow, and have >>them >> buffer data on local disk. > >True. But this seem to be a separate issue: >https://issues.cloudera.org/browse/FLUME-390. > >Alex Baranau >---- >Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase > >On Tue, Jun 28, 2011 at 7:40 PM, Doug Meil ><[EMAIL PROTECTED]>wrote: > >> I agree with what Todd & Gary said. I don't like retry-forever, >> especially as a default option in HBase. >> >> >> -----Original Message----- >> From: Gary Helmling [mailto:[EMAIL PROTECTED]] >> Sent: Tuesday, June 28, 2011 12:18 PM >> To: [EMAIL PROTECTED] >> Cc: Jonathan Hsieh >> Subject: Re: Retry HTable.put() on client-side to handle temp >>connectivity >> problem >> >> I'd also be wary of changing the default to retry forever. This might >>be >> hard to differentiate from a hang or deadlock for new users and seems to >> violate "least surprise". >> >> In many cases it's preferable to have some kind of predictable failure >>as >> well. So I think this would appear to be a regression in behavior. If >> you're serving say web site data from hbase, you may prefer an >>occasional >> error or timeout rather than having page loading hang forever. >> >> I'm all for making "retry forever" a configurable option, but do we need >> any new knobs here? >> >> --gh >> >> >> On Mon, Jun 27, 2011 at 3:23 PM, Joey Echeverria <[EMAIL PROTECTED]> >> wrote: >> >> > If I could override the default, I'd be a hesitant +1. I'd rather see >> > the default be something like retry 10 times, then throw an error. >> > With one option being infinite retries. >> > >> > -Joey >> > >> > On Mon, Jun 27, 2011 at 2:21 PM, Stack <[EMAIL PROTECTED]> wrote: >> > > I'd be fine with changing the default in hbase so clients just keep >> > > trying. What do others think? >> > > St.Ack >> > > >> > > On Mon, Jun 27, 2011 at 1:56 PM, Alex Baranau >> > > <[EMAIL PROTECTED]> >> > wrote: >> > >> The code I pasted works for me: it reconnects successfully. Just >> > >> thought >> > it >> > >> might be not the best way to do it.. I realized that by using HBase >> > >> configuration properties we could just say that it's up to user to >> > configure >> > >> HBase client (created by Flume) properly (e.g. by adding >> > >> hbase-site.xml >> > with >> > >> settings to classpath). On the other hand, it looks to me that >> > >> users of HBase sinks will *always* want it to retry writing to >> > >> HBase until it >> > works >> > >> out. But default configuration works not this way: sinks stops when >> > HBase is >> > >> temporarily down or inaccessible. Hence it makes using the sink >> > >> more complicated (because default configuration sucks), which I'd >> > >> like to >> > avoid >> > >> here by adding the code above. Ideally the default configuration >> > >> should >> > work >> > >> the best way for general-purpose case. >> > >> >> > >> I understood what are the ways to implement/configure such >> > >> behavior. I >> > think >> > >> we should discuss what is the best default behavior and do we need >> > >> to >> > allow >> > >> user override it on Flume ML (or directly at >> > >> https://issues.cloudera.org/browse/FLUME-685). >> > >> >> > >> Thank you guys, >> > >> >> > >> Alex Baranau >> > >> ---- >> > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop >> > >> - >> > HBase >> > >> >> > >> >> > >> On Mon, Jun 27, 2011 at 11:40 PM, Stack <[EMAIL PROTECTED]> wrote: >> > >> >> > > +
Doug Meil 2011-06-28, 19:41
-
Re: Retry HTable.put() on client-side to handle temp connectivity problemAlex Baranau 2011-06-29, 06:17
I think you are talking here about loosing some data from client-side
buffer. I don't think using batch will help. If we use batch from client code and want to use the client-side buffering, we would need to implement the same buffering code already implemented in HTable. The behavior and ack sending will be the same: the ack is sent after Flume sink receives the event, which might be buffered and not persisted (yet) to HBase. I haven't looked in Flume's ability to skip sending ack on receiving event in sink and doing it in batches later (after the actual persisting happens). Will investigate that as a separate effort. In general, please correct me if I'm wrong, but there won't be much difference between using HTable's batch and put: * with put() I can also tell what was persisted and which records failed, as they will be available in the client-side buffer after failures * internally put uses batch anyways (i.e. connection.processBatch) Alex Baranau ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase On Tue, Jun 28, 2011 at 10:41 PM, Doug Meil <[EMAIL PROTECTED]>wrote: > > But if Flume used the htable 'batch' method instead of 'put'... > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html# > batch%28java.util.List%29 > > .. doesn't it sidestep this issue? Because instead of being unsure what > was in the write-buffer and what wasn't, the caller knows exactly what was > sent and whether it was sent without error. > > > > > > On 6/28/11 1:07 PM, "Alex Baranau" <[EMAIL PROTECTED]> wrote: > > >> if the sink "dies" for some reason, then it should > >> push that back to the upstream parts of the flume dataflow, and have > >>them > >> buffer data on local disk. > > > >True. But this seem to be a separate issue: > >https://issues.cloudera.org/browse/FLUME-390. > > > >Alex Baranau > >---- > >Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - > HBase > > > >On Tue, Jun 28, 2011 at 7:40 PM, Doug Meil > ><[EMAIL PROTECTED]>wrote: > > > >> I agree with what Todd & Gary said. I don't like retry-forever, > >> especially as a default option in HBase. > >> > >> > >> -----Original Message----- > >> From: Gary Helmling [mailto:[EMAIL PROTECTED]] > >> Sent: Tuesday, June 28, 2011 12:18 PM > >> To: [EMAIL PROTECTED] > >> Cc: Jonathan Hsieh > >> Subject: Re: Retry HTable.put() on client-side to handle temp > >>connectivity > >> problem > >> > >> I'd also be wary of changing the default to retry forever. This might > >>be > >> hard to differentiate from a hang or deadlock for new users and seems to > >> violate "least surprise". > >> > >> In many cases it's preferable to have some kind of predictable failure > >>as > >> well. So I think this would appear to be a regression in behavior. If > >> you're serving say web site data from hbase, you may prefer an > >>occasional > >> error or timeout rather than having page loading hang forever. > >> > >> I'm all for making "retry forever" a configurable option, but do we need > >> any new knobs here? > >> > >> --gh > >> > >> > >> On Mon, Jun 27, 2011 at 3:23 PM, Joey Echeverria <[EMAIL PROTECTED]> > >> wrote: > >> > >> > If I could override the default, I'd be a hesitant +1. I'd rather see > >> > the default be something like retry 10 times, then throw an error. > >> > With one option being infinite retries. > >> > > >> > -Joey > >> > > >> > On Mon, Jun 27, 2011 at 2:21 PM, Stack <[EMAIL PROTECTED]> wrote: > >> > > I'd be fine with changing the default in hbase so clients just keep > >> > > trying. What do others think? > >> > > St.Ack > >> > > > >> > > On Mon, Jun 27, 2011 at 1:56 PM, Alex Baranau > >> > > <[EMAIL PROTECTED]> > >> > wrote: > >> > >> The code I pasted works for me: it reconnects successfully. Just > >> > >> thought > >> > it > >> > >> might be not the best way to do it.. I realized that by using HBase > >> > >> configuration properties we could just say that it's up to user to +
Alex Baranau 2011-06-29, 06:17
-
Re: Retry HTable.put() on client-side to handle temp connectivity problemDoug Meil 2011-06-29, 13:44
Hi there- 1) Buffer/Batch Addressing the comment in the Cloudera ticket (FLUME-390) "currently non-written events are lost.", I agree that two paths (write-buffer vs. batch-it-yourself) are available for Flume to recover from a failure and know what hasn't been sent (or what was at least attempted to be sent). Thus, I don't see this an "HBase issue". There are existing APIs for Flume to utilize that will get the job done. 2) Retry-forver. I've seen several folks vote -1 on retry-forever as default behavior. Based on the conversation I'm assuming this won't happen. Are there other aspects to this issue? I doesn't seem like any HBase changes are needed to address these issues. On 6/29/11 2:17 AM, "Alex Baranau" <[EMAIL PROTECTED]> wrote: >I think you are talking here about loosing some data from client-side >buffer. I don't think using batch will help. If we use batch from client >code and want to use the client-side buffering, we would need to implement >the same buffering code already implemented in HTable. The behavior and >ack >sending will be the same: the ack is sent after Flume sink receives the >event, which might be buffered and not persisted (yet) to HBase. I haven't >looked in Flume's ability to skip sending ack on receiving event in sink >and >doing it in batches later (after the actual persisting happens). Will >investigate that as a separate effort. > >In general, please correct me if I'm wrong, but there won't be much >difference between using HTable's batch and put: >* with put() I can also tell what was persisted and which records failed, >as >they will be available in the client-side buffer after failures >* internally put uses batch anyways (i.e. connection.processBatch) > >Alex Baranau >---- >Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase > >On Tue, Jun 28, 2011 at 10:41 PM, Doug Meil ><[EMAIL PROTECTED]>wrote: > >> >> But if Flume used the htable 'batch' method instead of 'put'... >> >> >>http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.htm >>l# >> batch%28java.util.List%29 >> >> .. doesn't it sidestep this issue? Because instead of being unsure what >> was in the write-buffer and what wasn't, the caller knows exactly what >>was >> sent and whether it was sent without error. >> >> >> >> >> >> On 6/28/11 1:07 PM, "Alex Baranau" <[EMAIL PROTECTED]> wrote: >> >> >> if the sink "dies" for some reason, then it should >> >> push that back to the upstream parts of the flume dataflow, and have >> >>them >> >> buffer data on local disk. >> > >> >True. But this seem to be a separate issue: >> >https://issues.cloudera.org/browse/FLUME-390. >> > >> >Alex Baranau >> >---- >> >Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - >> HBase >> > >> >On Tue, Jun 28, 2011 at 7:40 PM, Doug Meil >> ><[EMAIL PROTECTED]>wrote: >> > >> >> I agree with what Todd & Gary said. I don't like retry-forever, >> >> especially as a default option in HBase. >> >> >> >> >> >> -----Original Message----- >> >> From: Gary Helmling [mailto:[EMAIL PROTECTED]] >> >> Sent: Tuesday, June 28, 2011 12:18 PM >> >> To: [EMAIL PROTECTED] >> >> Cc: Jonathan Hsieh >> >> Subject: Re: Retry HTable.put() on client-side to handle temp >> >>connectivity >> >> problem >> >> >> >> I'd also be wary of changing the default to retry forever. This >>might >> >>be >> >> hard to differentiate from a hang or deadlock for new users and >>seems to >> >> violate "least surprise". >> >> >> >> In many cases it's preferable to have some kind of predictable >>failure >> >>as >> >> well. So I think this would appear to be a regression in behavior. >>If >> >> you're serving say web site data from hbase, you may prefer an >> >>occasional >> >> error or timeout rather than having page loading hang forever. >> >> >> >> I'm all for making "retry forever" a configurable option, but do we >>need >> >> any new knobs here? >> >> >> >> --gh >> >> >> >> > +
Doug Meil 2011-06-29, 13:44
-
Re: Retry HTable.put() on client-side to handle temp connectivity problemAlex Baranau 2011-06-29, 14:57
All correct. No changes in HBase needed (no were requested actually,
changing default retry behavior was just suggestion by Stack). Thank you all for participating! Alex Baranau ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase On Wed, Jun 29, 2011 at 4:44 PM, Doug Meil <[EMAIL PROTECTED]>wrote: > > Hi there- > > 1) Buffer/Batch > > Addressing the comment in the Cloudera ticket (FLUME-390) "currently > non-written events are lost.", I agree that two paths (write-buffer vs. > batch-it-yourself) are available for Flume to recover from a failure and > know what hasn't been sent (or what was at least attempted to be sent). > > Thus, I don't see this an "HBase issue". There are existing APIs for > Flume to utilize that will get the job done. > > 2) Retry-forver. > > I've seen several folks vote -1 on retry-forever as default behavior. > Based on the conversation I'm assuming this won't happen. > > > Are there other aspects to this issue? I doesn't seem like any HBase > changes are needed to address these issues. > > > > > > On 6/29/11 2:17 AM, "Alex Baranau" <[EMAIL PROTECTED]> wrote: > > >I think you are talking here about loosing some data from client-side > >buffer. I don't think using batch will help. If we use batch from client > >code and want to use the client-side buffering, we would need to implement > >the same buffering code already implemented in HTable. The behavior and > >ack > >sending will be the same: the ack is sent after Flume sink receives the > >event, which might be buffered and not persisted (yet) to HBase. I haven't > >looked in Flume's ability to skip sending ack on receiving event in sink > >and > >doing it in batches later (after the actual persisting happens). Will > >investigate that as a separate effort. > > > >In general, please correct me if I'm wrong, but there won't be much > >difference between using HTable's batch and put: > >* with put() I can also tell what was persisted and which records failed, > >as > >they will be available in the client-side buffer after failures > >* internally put uses batch anyways (i.e. connection.processBatch) > > > >Alex Baranau > >---- > >Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - > HBase > > > >On Tue, Jun 28, 2011 at 10:41 PM, Doug Meil > ><[EMAIL PROTECTED]>wrote: > > > >> > >> But if Flume used the htable 'batch' method instead of 'put'... > >> > >> > >> > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.htm > >>l# > >> batch%28java.util.List%29 > >> > >> .. doesn't it sidestep this issue? Because instead of being unsure what > >> was in the write-buffer and what wasn't, the caller knows exactly what > >>was > >> sent and whether it was sent without error. > >> > >> > >> > >> > >> > >> On 6/28/11 1:07 PM, "Alex Baranau" <[EMAIL PROTECTED]> wrote: > >> > >> >> if the sink "dies" for some reason, then it should > >> >> push that back to the upstream parts of the flume dataflow, and have > >> >>them > >> >> buffer data on local disk. > >> > > >> >True. But this seem to be a separate issue: > >> >https://issues.cloudera.org/browse/FLUME-390. > >> > > >> >Alex Baranau > >> >---- > >> >Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - > >> HBase > >> > > >> >On Tue, Jun 28, 2011 at 7:40 PM, Doug Meil > >> ><[EMAIL PROTECTED]>wrote: > >> > > >> >> I agree with what Todd & Gary said. I don't like retry-forever, > >> >> especially as a default option in HBase. > >> >> > >> >> > >> >> -----Original Message----- > >> >> From: Gary Helmling [mailto:[EMAIL PROTECTED]] > >> >> Sent: Tuesday, June 28, 2011 12:18 PM > >> >> To: [EMAIL PROTECTED] > >> >> Cc: Jonathan Hsieh > >> >> Subject: Re: Retry HTable.put() on client-side to handle temp > >> >>connectivity > >> >> problem > >> >> > >> >> I'd also be wary of changing the default to retry forever. This > >>might > >> >>be > >> >> hard to differentiate from a hang or deadlock for new users and +
Alex Baranau 2011-06-29, 14:57
-
Re: Retry HTable.put() on client-side to handle temp connectivity problemAndrew Purtell 2011-06-28, 23:17
I also think if it takes 10 minutes to fail, that is probably too long.
Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) ---> From: Doug Meil <[EMAIL PROTECTED]> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Cc: Jonathan Hsieh <[EMAIL PROTECTED]> > Sent: Tuesday, June 28, 2011 9:40 AM > Subject: RE: Retry HTable.put() on client-side to handle temp connectivity problem > > I agree with what Todd & Gary said. I don't like retry-forever, > especially as a default option in HBase. > > > -----Original Message----- > From: Gary Helmling [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, June 28, 2011 12:18 PM > To: [EMAIL PROTECTED] > Cc: Jonathan Hsieh > Subject: Re: Retry HTable.put() on client-side to handle temp connectivity > problem > > I'd also be wary of changing the default to retry forever. This might be > hard to differentiate from a hang or deadlock for new users and seems to violate > "least surprise". > > In many cases it's preferable to have some kind of predictable failure as > well. So I think this would appear to be a regression in behavior. If > you're serving say web site data from hbase, you may prefer an occasional > error or timeout rather than having page loading hang forever. > > I'm all for making "retry forever" a configurable option, but do > we need any new knobs here? > > --gh > > > On Mon, Jun 27, 2011 at 3:23 PM, Joey Echeverria <[EMAIL PROTECTED]> > wrote: > >> If I could override the default, I'd be a hesitant +1. I'd rather > see >> the default be something like retry 10 times, then throw an error. >> With one option being infinite retries. >> >> -Joey >> >> On Mon, Jun 27, 2011 at 2:21 PM, Stack <[EMAIL PROTECTED]> wrote: >> > I'd be fine with changing the default in hbase so clients just > keep >> > trying. What do others think? >> > St.Ack >> > >> > On Mon, Jun 27, 2011 at 1:56 PM, Alex Baranau >> > <[EMAIL PROTECTED]> >> wrote: >> >> The code I pasted works for me: it reconnects successfully. Just >> >> thought >> it >> >> might be not the best way to do it.. I realized that by using > HBase >> >> configuration properties we could just say that it's up to > user to >> configure >> >> HBase client (created by Flume) properly (e.g. by adding >> >> hbase-site.xml >> with >> >> settings to classpath). On the other hand, it looks to me that >> >> users of HBase sinks will *always* want it to retry writing to >> >> HBase until it >> works >> >> out. But default configuration works not this way: sinks stops > when >> HBase is >> >> temporarily down or inaccessible. Hence it makes using the sink >> >> more complicated (because default configuration sucks), which > I'd >> >> like to >> avoid >> >> here by adding the code above. Ideally the default configuration >> >> should >> work >> >> the best way for general-purpose case. >> >> >> >> I understood what are the ways to implement/configure such >> >> behavior. I >> think >> >> we should discuss what is the best default behavior and do we need > >> >> to >> allow >> >> user override it on Flume ML (or directly at >> >> https://issues.cloudera.org/browse/FLUME-685). >> >> >> >> Thank you guys, >> >> >> >> Alex Baranau >> >> ---- >> >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop > >> >> - >> HBase >> >> >> >> >> >> On Mon, Jun 27, 2011 at 11:40 PM, Stack <[EMAIL PROTECTED]> > wrote: >> >> >> >>> Either should work Alex. Your version will go "for > ever". Have >> >>> you tried yanking hbase out from under the client to see if it > reconnects? >> >>> >> >>> Good on you, >> >>> St.Ack >> >>> >> >>> On Mon, Jun 27, 2011 at 1:33 PM, Alex Baranau < >> [EMAIL PROTECTED]> >> >>> wrote: >> >>> > Yes, that is what intended, I think. To make the whole > picture >> >>> > clear, >> >>> here's >> >>> > the context: >> >> +
Andrew Purtell 2011-06-28, 23:17
-
Re: Retry HTable.put() on client-side to handle temp connectivity problemTodd Lipcon 2011-06-28, 15:45
With Flume's store-and-forward, why do we need retry-forever in the HBase
side? It seems to me that if the sink "dies" for some reason, then it should push that back to the upstream parts of the flume dataflow, and have them buffer data on local disk. -Todd On Mon, Jun 27, 2011 at 1:56 PM, Alex Baranau <[EMAIL PROTECTED]>wrote: > The code I pasted works for me: it reconnects successfully. Just thought it > might be not the best way to do it.. I realized that by using HBase > configuration properties we could just say that it's up to user to > configure > HBase client (created by Flume) properly (e.g. by adding hbase-site.xml > with > settings to classpath). On the other hand, it looks to me that users of > HBase sinks will *always* want it to retry writing to HBase until it works > out. But default configuration works not this way: sinks stops when HBase > is > temporarily down or inaccessible. Hence it makes using the sink more > complicated (because default configuration sucks), which I'd like to avoid > here by adding the code above. Ideally the default configuration should > work > the best way for general-purpose case. > > I understood what are the ways to implement/configure such behavior. I > think > we should discuss what is the best default behavior and do we need to allow > user override it on Flume ML (or directly at > https://issues.cloudera.org/browse/FLUME-685). > > Thank you guys, > > Alex Baranau > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase > > > On Mon, Jun 27, 2011 at 11:40 PM, Stack <[EMAIL PROTECTED]> wrote: > > > Either should work Alex. Your version will go "for ever". Have you > > tried yanking hbase out from under the client to see if it reconnects? > > > > Good on you, > > St.Ack > > > > On Mon, Jun 27, 2011 at 1:33 PM, Alex Baranau <[EMAIL PROTECTED]> > > wrote: > > > Yes, that is what intended, I think. To make the whole picture clear, > > here's > > > the context: > > > > > > * there's a Flume's HBase sink (read: HBase client) which writes data > > from > > > Flume "pipe" (read: some event-based messages source) to HTable; > > > * when HBase is down for some time (with default HBase configuration on > > > Flume's sink side) HTable.put throws exception and client exits (it > > usually > > > takes ~10 min to fail); > > > * Flume is smart enough to accumulate data to be written reliably if > sink > > > behaves badly (not writing for some time, pauses, etc.), so it would be > > > great if the sink tries to write data until HBase is up again, BUT: > > > * but here, as we have complete "failure" of sink process (thread needs > > to > > > be restarted) the data never reaches HTable even after HBase cluster is > > > brought up again. > > > > > > So you suggest instead of this extra construction around HTable.put to > > use > > > configuration properties "hbase.client.pause" and > > > "hbase.client.retries.number"? I.e. make retries attempts to be > > (reasonably) > > > close to "perform forever". Is that what you meant? > > > > > > Thank you, > > > Alex Baranau > > > ---- > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - > > HBase > > > > > > On Mon, Jun 27, 2011 at 11:16 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > >> This would retry indefinitely, right ? > > >> Normally maximum retry duration would govern how long the retry is > > >> attempted. > > >> > > >> On Mon, Jun 27, 2011 at 1:08 PM, Alex Baranau < > [EMAIL PROTECTED] > > >> >wrote: > > >> > > >> > Hello, > > >> > > > >> > Just wanted to confirm that I'm doing things in a proper way here. > How > > >> > about > > >> > this code to handle the temp cluster connectivity problems (or > cluster > > >> down > > >> > time) on client-side? > > >> > > > >> > + // HTable.put() will fail with exception if connection to > cluster > > is > > >> > temporarily broken or > > >> > + // cluster is temporarily down. To be sure data is written we > > retry > > >> > writing. Todd Lipcon Software Engineer, Cloudera +
Todd Lipcon 2011-06-28, 15:45
|