|
|
-
Scanner timeout -- any reason not to raise?
Dan Crosta 2013-03-17, 18:46
We occasionally get scanner timeout errors such as "66698ms passed since the last invocation, timeout is currently set to 60000" when iterating a scanner through the Thrift API. Is there any reason not to raise the timeout to something larger than the default 60s? Put another way, what resources (and how much of them) does a scanner take up on a thrift server or region server?
Also, to confirm -- I believe "hbase.rpc.timeout" is the setting in question here, but someone please correct me if I'm wrong.
Thanks, - Dan
-
Re: Scanner timeout -- any reason not to raise?
Ted Yu 2013-03-17, 20:46
Which HBase version are you using ?
In 0.94 and prior, the config param is hbase.regionserver.lease.period
In 0.95, it is different. See release notes of HBASE-6170
On Sun, Mar 17, 2013 at 11:46 AM, Dan Crosta <[EMAIL PROTECTED]> wrote:
> We occasionally get scanner timeout errors such as "66698ms passed since > the last invocation, timeout is currently set to 60000" when iterating a > scanner through the Thrift API. Is there any reason not to raise the > timeout to something larger than the default 60s? Put another way, what > resources (and how much of them) does a scanner take up on a thrift server > or region server? > > Also, to confirm -- I believe "hbase.rpc.timeout" is the setting in > question here, but someone please correct me if I'm wrong. > > Thanks, > - Dan > > >
-
Re: Scanner timeout -- any reason not to raise?
Dan Crosta 2013-03-17, 20:56
Ah, thanks Ted -- I was wondering what that setting was for.
We are using CDH 4.2.0, which is HBase 0.94.2 (give or take a few backports from 0.94.3).
Is there any harm in setting the lease timeout to something larger, like 5 or 10 minutes?
Thanks, - Dan
On Mar 17, 2013, at 1:46 PM, Ted Yu wrote:
> Which HBase version are you using ? > > In 0.94 and prior, the config param is hbase.regionserver.lease.period > > In 0.95, it is different. See release notes of HBASE-6170 > > On Sun, Mar 17, 2013 at 11:46 AM, Dan Crosta <[EMAIL PROTECTED]> wrote: > >> We occasionally get scanner timeout errors such as "66698ms passed since >> the last invocation, timeout is currently set to 60000" when iterating a >> scanner through the Thrift API. Is there any reason not to raise the >> timeout to something larger than the default 60s? Put another way, what >> resources (and how much of them) does a scanner take up on a thrift server >> or region server? >> >> Also, to confirm -- I believe "hbase.rpc.timeout" is the setting in >> question here, but someone please correct me if I'm wrong. >> >> Thanks, >> - Dan >> >> >>
-
Re: Scanner timeout -- any reason not to raise?
Ted Yu 2013-03-17, 21:20
The lease timeout is used by row locking too. That's the reason behind splitting the setting into two config parameters.
How is your load composition ? Do you mostly serve reads from HBase ?
Cheers
On Sun, Mar 17, 2013 at 1:56 PM, Dan Crosta <[EMAIL PROTECTED]> wrote:
> Ah, thanks Ted -- I was wondering what that setting was for. > > We are using CDH 4.2.0, which is HBase 0.94.2 (give or take a few > backports from 0.94.3). > > Is there any harm in setting the lease timeout to something larger, like 5 > or 10 minutes? > > Thanks, > - Dan > > On Mar 17, 2013, at 1:46 PM, Ted Yu wrote: > > > Which HBase version are you using ? > > > > In 0.94 and prior, the config param is hbase.regionserver.lease.period > > > > In 0.95, it is different. See release notes of HBASE-6170 > > > > On Sun, Mar 17, 2013 at 11:46 AM, Dan Crosta <[EMAIL PROTECTED]> wrote: > > > >> We occasionally get scanner timeout errors such as "66698ms passed since > >> the last invocation, timeout is currently set to 60000" when iterating a > >> scanner through the Thrift API. Is there any reason not to raise the > >> timeout to something larger than the default 60s? Put another way, what > >> resources (and how much of them) does a scanner take up on a thrift > server > >> or region server? > >> > >> Also, to confirm -- I believe "hbase.rpc.timeout" is the setting in > >> question here, but someone please correct me if I'm wrong. > >> > >> Thanks, > >> - Dan > >> > >> > >> > >
-
Re: Scanner timeout -- any reason not to raise?
Dan Crosta 2013-03-20, 16:32
I'm confused -- I only see one setting in CDH manager, what is the name of the other setting?
Our load is moderately frequent small writes (in batches of 1000 cells at a time, typically split over a few hundred rows -- these complete very fast, we haven't seen any timeouts there), and infrequent batches of large reads (scans), which is where we do see timeouts. My guess is that the timeout is more due to our application taking some time -- apparently more than 60s -- to process the results of each scan's output, rather than due to slowness in HBase itself, which tends to be only moderately loaded (judging by CPU, network, and disk) while we do the reads.
Thanks, - Dan
On Mar 17, 2013, at 2:20 PM, Ted Yu wrote:
> The lease timeout is used by row locking too. > That's the reason behind splitting the setting into two config parameters. > > How is your load composition ? Do you mostly serve reads from HBase ? > > Cheers > > On Sun, Mar 17, 2013 at 1:56 PM, Dan Crosta <[EMAIL PROTECTED]> wrote: > >> Ah, thanks Ted -- I was wondering what that setting was for. >> >> We are using CDH 4.2.0, which is HBase 0.94.2 (give or take a few >> backports from 0.94.3). >> >> Is there any harm in setting the lease timeout to something larger, like 5 >> or 10 minutes? >> >> Thanks, >> - Dan >> >> On Mar 17, 2013, at 1:46 PM, Ted Yu wrote: >> >>> Which HBase version are you using ? >>> >>> In 0.94 and prior, the config param is hbase.regionserver.lease.period >>> >>> In 0.95, it is different. See release notes of HBASE-6170 >>> >>> On Sun, Mar 17, 2013 at 11:46 AM, Dan Crosta <[EMAIL PROTECTED]> wrote: >>> >>>> We occasionally get scanner timeout errors such as "66698ms passed since >>>> the last invocation, timeout is currently set to 60000" when iterating a >>>> scanner through the Thrift API. Is there any reason not to raise the >>>> timeout to something larger than the default 60s? Put another way, what >>>> resources (and how much of them) does a scanner take up on a thrift >> server >>>> or region server? >>>> >>>> Also, to confirm -- I believe "hbase.rpc.timeout" is the setting in >>>> question here, but someone please correct me if I'm wrong. >>>> >>>> Thanks, >>>> - Dan >>>> >>>> >>>> >> >>
-
Re: Scanner timeout -- any reason not to raise?
Ted Yu 2013-03-20, 17:00
In 0.94, there is only one setting. See release notes of HBASE-6170 which is in 0.95 Looks like this should help (in 0.95): https://issues.apache.org/jira/browse/HBASE-2214Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly >From your description, you should be able to raise the timeout since the writes are relatively fast. Cheers On Wed, Mar 20, 2013 at 9:32 AM, Dan Crosta <[EMAIL PROTECTED]> wrote: > I'm confused -- I only see one setting in CDH manager, what is the name of > the other setting? > > Our load is moderately frequent small writes (in batches of 1000 cells at > a time, typically split over a few hundred rows -- these complete very > fast, we haven't seen any timeouts there), and infrequent batches of large > reads (scans), which is where we do see timeouts. My guess is that the > timeout is more due to our application taking some time -- apparently more > than 60s -- to process the results of each scan's output, rather than due > to slowness in HBase itself, which tends to be only moderately loaded > (judging by CPU, network, and disk) while we do the reads. > > Thanks, > - Dan > > On Mar 17, 2013, at 2:20 PM, Ted Yu wrote: > > > The lease timeout is used by row locking too. > > That's the reason behind splitting the setting into two config > parameters. > > > > How is your load composition ? Do you mostly serve reads from HBase ? > > > > Cheers > > > > On Sun, Mar 17, 2013 at 1:56 PM, Dan Crosta <[EMAIL PROTECTED]> wrote: > > > >> Ah, thanks Ted -- I was wondering what that setting was for. > >> > >> We are using CDH 4.2.0, which is HBase 0.94.2 (give or take a few > >> backports from 0.94.3). > >> > >> Is there any harm in setting the lease timeout to something larger, > like 5 > >> or 10 minutes? > >> > >> Thanks, > >> - Dan > >> > >> On Mar 17, 2013, at 1:46 PM, Ted Yu wrote: > >> > >>> Which HBase version are you using ? > >>> > >>> In 0.94 and prior, the config param is hbase.regionserver.lease.period > >>> > >>> In 0.95, it is different. See release notes of HBASE-6170 > >>> > >>> On Sun, Mar 17, 2013 at 11:46 AM, Dan Crosta <[EMAIL PROTECTED]> wrote: > >>> > >>>> We occasionally get scanner timeout errors such as "66698ms passed > since > >>>> the last invocation, timeout is currently set to 60000" when > iterating a > >>>> scanner through the Thrift API. Is there any reason not to raise the > >>>> timeout to something larger than the default 60s? Put another way, > what > >>>> resources (and how much of them) does a scanner take up on a thrift > >> server > >>>> or region server? > >>>> > >>>> Also, to confirm -- I believe "hbase.rpc.timeout" is the setting in > >>>> question here, but someone please correct me if I'm wrong. > >>>> > >>>> Thanks, > >>>> - Dan > >>>> > >>>> > >>>> > >> > >> > >
-
Re: Scanner timeout -- any reason not to raise?
Bryan Beaudreault 2013-03-20, 17:05
Typically it is better to use caching and batch size to limit the number of rows returned and thus the amount of processing required between calls to next() during a scan, but it would be nice if HBase provided a way to manually refresh a lease similar to Hadoop's context.progress(). In a cluster that is used for many different applications, upping the global lease timeout is a heavy handed solution. Even being able to override the timeout on a per-scan basis would be nice. Thoughts on that, Ted? On Wed, Mar 20, 2013 at 1:00 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > In 0.94, there is only one setting. > See release notes of HBASE-6170 which is in 0.95 > > Looks like this should help (in 0.95): > > https://issues.apache.org/jira/browse/HBASE-2214> Do HBASE-1996 -- setting size to return in scan rather than count of rows > -- properly > > From your description, you should be able to raise the timeout since the > writes are relatively fast. > > Cheers > > On Wed, Mar 20, 2013 at 9:32 AM, Dan Crosta <[EMAIL PROTECTED]> wrote: > > > I'm confused -- I only see one setting in CDH manager, what is the name > of > > the other setting? > > > > Our load is moderately frequent small writes (in batches of 1000 cells at > > a time, typically split over a few hundred rows -- these complete very > > fast, we haven't seen any timeouts there), and infrequent batches of > large > > reads (scans), which is where we do see timeouts. My guess is that the > > timeout is more due to our application taking some time -- apparently > more > > than 60s -- to process the results of each scan's output, rather than due > > to slowness in HBase itself, which tends to be only moderately loaded > > (judging by CPU, network, and disk) while we do the reads. > > > > Thanks, > > - Dan > > > > On Mar 17, 2013, at 2:20 PM, Ted Yu wrote: > > > > > The lease timeout is used by row locking too. > > > That's the reason behind splitting the setting into two config > > parameters. > > > > > > How is your load composition ? Do you mostly serve reads from HBase ? > > > > > > Cheers > > > > > > On Sun, Mar 17, 2013 at 1:56 PM, Dan Crosta <[EMAIL PROTECTED]> wrote: > > > > > >> Ah, thanks Ted -- I was wondering what that setting was for. > > >> > > >> We are using CDH 4.2.0, which is HBase 0.94.2 (give or take a few > > >> backports from 0.94.3). > > >> > > >> Is there any harm in setting the lease timeout to something larger, > > like 5 > > >> or 10 minutes? > > >> > > >> Thanks, > > >> - Dan > > >> > > >> On Mar 17, 2013, at 1:46 PM, Ted Yu wrote: > > >> > > >>> Which HBase version are you using ? > > >>> > > >>> In 0.94 and prior, the config param is > hbase.regionserver.lease.period > > >>> > > >>> In 0.95, it is different. See release notes of HBASE-6170 > > >>> > > >>> On Sun, Mar 17, 2013 at 11:46 AM, Dan Crosta <[EMAIL PROTECTED]> > wrote: > > >>> > > >>>> We occasionally get scanner timeout errors such as "66698ms passed > > since > > >>>> the last invocation, timeout is currently set to 60000" when > > iterating a > > >>>> scanner through the Thrift API. Is there any reason not to raise the > > >>>> timeout to something larger than the default 60s? Put another way, > > what > > >>>> resources (and how much of them) does a scanner take up on a thrift > > >> server > > >>>> or region server? > > >>>> > > >>>> Also, to confirm -- I believe "hbase.rpc.timeout" is the setting in > > >>>> question here, but someone please correct me if I'm wrong. > > >>>> > > >>>> Thanks, > > >>>> - Dan > > >>>> > > >>>> > > >>>> > > >> > > >> > > > > >
-
Re: Scanner timeout -- any reason not to raise?
Ted Yu 2013-03-20, 17:11
bq. if HBase provided a way to manually refresh a lease similar to Hadoop's context.progress() Can you outline how the above works for long scan ? bq. Even being able to override the timeout on a per-scan basis would be nice. Agreed. On Wed, Mar 20, 2013 at 10:05 AM, Bryan Beaudreault < [EMAIL PROTECTED]> wrote: > Typically it is better to use caching and batch size to limit the number of > rows returned and thus the amount of processing required between calls to > next() during a scan, but it would be nice if HBase provided a way to > manually refresh a lease similar to Hadoop's context.progress(). In a > cluster that is used for many different applications, upping the global > lease timeout is a heavy handed solution. Even being able to override the > timeout on a per-scan basis would be nice. > > Thoughts on that, Ted? > > > On Wed, Mar 20, 2013 at 1:00 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > In 0.94, there is only one setting. > > See release notes of HBASE-6170 which is in 0.95 > > > > Looks like this should help (in 0.95): > > > > https://issues.apache.org/jira/browse/HBASE-2214> > Do HBASE-1996 -- setting size to return in scan rather than count of rows > > -- properly > > > > From your description, you should be able to raise the timeout since the > > writes are relatively fast. > > > > Cheers > > > > On Wed, Mar 20, 2013 at 9:32 AM, Dan Crosta <[EMAIL PROTECTED]> wrote: > > > > > I'm confused -- I only see one setting in CDH manager, what is the name > > of > > > the other setting? > > > > > > Our load is moderately frequent small writes (in batches of 1000 cells > at > > > a time, typically split over a few hundred rows -- these complete very > > > fast, we haven't seen any timeouts there), and infrequent batches of > > large > > > reads (scans), which is where we do see timeouts. My guess is that the > > > timeout is more due to our application taking some time -- apparently > > more > > > than 60s -- to process the results of each scan's output, rather than > due > > > to slowness in HBase itself, which tends to be only moderately loaded > > > (judging by CPU, network, and disk) while we do the reads. > > > > > > Thanks, > > > - Dan > > > > > > On Mar 17, 2013, at 2:20 PM, Ted Yu wrote: > > > > > > > The lease timeout is used by row locking too. > > > > That's the reason behind splitting the setting into two config > > > parameters. > > > > > > > > How is your load composition ? Do you mostly serve reads from HBase ? > > > > > > > > Cheers > > > > > > > > On Sun, Mar 17, 2013 at 1:56 PM, Dan Crosta <[EMAIL PROTECTED]> > wrote: > > > > > > > >> Ah, thanks Ted -- I was wondering what that setting was for. > > > >> > > > >> We are using CDH 4.2.0, which is HBase 0.94.2 (give or take a few > > > >> backports from 0.94.3). > > > >> > > > >> Is there any harm in setting the lease timeout to something larger, > > > like 5 > > > >> or 10 minutes? > > > >> > > > >> Thanks, > > > >> - Dan > > > >> > > > >> On Mar 17, 2013, at 1:46 PM, Ted Yu wrote: > > > >> > > > >>> Which HBase version are you using ? > > > >>> > > > >>> In 0.94 and prior, the config param is > > hbase.regionserver.lease.period > > > >>> > > > >>> In 0.95, it is different. See release notes of HBASE-6170 > > > >>> > > > >>> On Sun, Mar 17, 2013 at 11:46 AM, Dan Crosta <[EMAIL PROTECTED]> > > wrote: > > > >>> > > > >>>> We occasionally get scanner timeout errors such as "66698ms passed > > > since > > > >>>> the last invocation, timeout is currently set to 60000" when > > > iterating a > > > >>>> scanner through the Thrift API. Is there any reason not to raise > the > > > >>>> timeout to something larger than the default 60s? Put another way, > > > what > > > >>>> resources (and how much of them) does a scanner take up on a > thrift > > > >> server > > > >>>> or region server? > > > >>>> > > > >>>> Also, to confirm -- I believe "hbase.rpc.timeout" is the setting > in > > > >>>> question here, but someone please correct me if I'm wrong. > > > >>>>
-
Re: Scanner timeout -- any reason not to raise?
Bryan Beaudreault 2013-03-20, 17:39
I was thinking something like this: Scan scan = new Scan(startRow, endRow); scan.setCaching(someVal); // based on what we expect most rows to take for processing time ResultScanner scanner = table.getScanner(scan); for (Result r : scanner) { // usual processing, the time for which we accounted for in our caching and global lease timeout settings if (someCondition) { // More time-intensive processing necessary on this record, which is hard to account for in the caching scanner.progress(); } } -- I'm not sure how we could expose this in the context of a hadoop job, since I don't believe we have access to the underlying scanner, but that would be great also. On Wed, Mar 20, 2013 at 1:11 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > bq. if HBase provided a way to manually refresh a lease similar to > Hadoop's context.progress() > > Can you outline how the above works for long scan ? > > bq. Even being able to override the timeout on a per-scan basis would be > nice. > > Agreed. > > On Wed, Mar 20, 2013 at 10:05 AM, Bryan Beaudreault < > [EMAIL PROTECTED]> wrote: > > > Typically it is better to use caching and batch size to limit the number > of > > rows returned and thus the amount of processing required between calls to > > next() during a scan, but it would be nice if HBase provided a way to > > manually refresh a lease similar to Hadoop's context.progress(). In a > > cluster that is used for many different applications, upping the global > > lease timeout is a heavy handed solution. Even being able to override > the > > timeout on a per-scan basis would be nice. > > > > Thoughts on that, Ted? > > > > > > On Wed, Mar 20, 2013 at 1:00 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > In 0.94, there is only one setting. > > > See release notes of HBASE-6170 which is in 0.95 > > > > > > Looks like this should help (in 0.95): > > > > > > https://issues.apache.org/jira/browse/HBASE-2214> > > Do HBASE-1996 -- setting size to return in scan rather than count of > rows > > > -- properly > > > > > > From your description, you should be able to raise the timeout since > the > > > writes are relatively fast. > > > > > > Cheers > > > > > > On Wed, Mar 20, 2013 at 9:32 AM, Dan Crosta <[EMAIL PROTECTED]> wrote: > > > > > > > I'm confused -- I only see one setting in CDH manager, what is the > name > > > of > > > > the other setting? > > > > > > > > Our load is moderately frequent small writes (in batches of 1000 > cells > > at > > > > a time, typically split over a few hundred rows -- these complete > very > > > > fast, we haven't seen any timeouts there), and infrequent batches of > > > large > > > > reads (scans), which is where we do see timeouts. My guess is that > the > > > > timeout is more due to our application taking some time -- apparently > > > more > > > > than 60s -- to process the results of each scan's output, rather than > > due > > > > to slowness in HBase itself, which tends to be only moderately loaded > > > > (judging by CPU, network, and disk) while we do the reads. > > > > > > > > Thanks, > > > > - Dan > > > > > > > > On Mar 17, 2013, at 2:20 PM, Ted Yu wrote: > > > > > > > > > The lease timeout is used by row locking too. > > > > > That's the reason behind splitting the setting into two config > > > > parameters. > > > > > > > > > > How is your load composition ? Do you mostly serve reads from > HBase ? > > > > > > > > > > Cheers > > > > > > > > > > On Sun, Mar 17, 2013 at 1:56 PM, Dan Crosta <[EMAIL PROTECTED]> > > wrote: > > > > > > > > > >> Ah, thanks Ted -- I was wondering what that setting was for. > > > > >> > > > > >> We are using CDH 4.2.0, which is HBase 0.94.2 (give or take a few > > > > >> backports from 0.94.3). > > > > >> > > > > >> Is there any harm in setting the lease timeout to something > larger, > > > > like 5 > > > > >> or 10 minutes? > > > > >> > > > > >> Thanks, > > > > >> - Dan > > > > >> > > > > >> On Mar 17, 2013, at 1:46 PM, Ted Yu wrote: > > > > >> > > >
-
Re: Scanner timeout -- any reason not to raise?
Ted Yu 2013-03-20, 17:56
Bryan: Interesting idea. You can log a JIRA with the following two suggestions. On Wed, Mar 20, 2013 at 10:39 AM, Bryan Beaudreault < [EMAIL PROTECTED]> wrote: > I was thinking something like this: > > Scan scan = new Scan(startRow, endRow); > > scan.setCaching(someVal); // based on what we expect most rows to take for > processing time > > ResultScanner scanner = table.getScanner(scan); > > for (Result r : scanner) { > > // usual processing, the time for which we accounted for in our caching > and global lease timeout settings > > if (someCondition) { > > // More time-intensive processing necessary on this record, which is > hard to account for in the caching > > scanner.progress(); > > } > > } > > > -- > > I'm not sure how we could expose this in the context of a hadoop job, since > I don't believe we have access to the underlying scanner, but that would be > great also. > > > On Wed, Mar 20, 2013 at 1:11 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > bq. if HBase provided a way to manually refresh a lease similar to > > Hadoop's context.progress() > > > > Can you outline how the above works for long scan ? > > > > bq. Even being able to override the timeout on a per-scan basis would be > > nice. > > > > Agreed. > > > > On Wed, Mar 20, 2013 at 10:05 AM, Bryan Beaudreault < > > [EMAIL PROTECTED]> wrote: > > > > > Typically it is better to use caching and batch size to limit the > number > > of > > > rows returned and thus the amount of processing required between calls > to > > > next() during a scan, but it would be nice if HBase provided a way to > > > manually refresh a lease similar to Hadoop's context.progress(). In a > > > cluster that is used for many different applications, upping the global > > > lease timeout is a heavy handed solution. Even being able to override > > the > > > timeout on a per-scan basis would be nice. > > > > > > Thoughts on that, Ted? > > > > > > > > > On Wed, Mar 20, 2013 at 1:00 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > > > In 0.94, there is only one setting. > > > > See release notes of HBASE-6170 which is in 0.95 > > > > > > > > Looks like this should help (in 0.95): > > > > > > > > https://issues.apache.org/jira/browse/HBASE-2214> > > > Do HBASE-1996 -- setting size to return in scan rather than count of > > rows > > > > -- properly > > > > > > > > From your description, you should be able to raise the timeout since > > the > > > > writes are relatively fast. > > > > > > > > Cheers > > > > > > > > On Wed, Mar 20, 2013 at 9:32 AM, Dan Crosta <[EMAIL PROTECTED]> > wrote: > > > > > > > > > I'm confused -- I only see one setting in CDH manager, what is the > > name > > > > of > > > > > the other setting? > > > > > > > > > > Our load is moderately frequent small writes (in batches of 1000 > > cells > > > at > > > > > a time, typically split over a few hundred rows -- these complete > > very > > > > > fast, we haven't seen any timeouts there), and infrequent batches > of > > > > large > > > > > reads (scans), which is where we do see timeouts. My guess is that > > the > > > > > timeout is more due to our application taking some time -- > apparently > > > > more > > > > > than 60s -- to process the results of each scan's output, rather > than > > > due > > > > > to slowness in HBase itself, which tends to be only moderately > loaded > > > > > (judging by CPU, network, and disk) while we do the reads. > > > > > > > > > > Thanks, > > > > > - Dan > > > > > > > > > > On Mar 17, 2013, at 2:20 PM, Ted Yu wrote: > > > > > > > > > > > The lease timeout is used by row locking too. > > > > > > That's the reason behind splitting the setting into two config > > > > > parameters. > > > > > > > > > > > > How is your load composition ? Do you mostly serve reads from > > HBase ? > > > > > > > > > > > > Cheers > > > > > > > > > > > > On Sun, Mar 17, 2013 at 1:56 PM, Dan Crosta <[EMAIL PROTECTED]> > > > wrote: > > > > > > > > > > > >> Ah, thanks Ted -- I was wondering what that setting was for.
-
Re: Scanner timeout -- any reason not to raise?
Bryan Beaudreault 2013-03-20, 19:13
Thanks Ted, I've submitted https://issues.apache.org/jira/browse/HBASE-8157. On Wed, Mar 20, 2013 at 1:56 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > Bryan: > Interesting idea. > > You can log a JIRA with the following two suggestions. > > On Wed, Mar 20, 2013 at 10:39 AM, Bryan Beaudreault < > [EMAIL PROTECTED]> wrote: > > > I was thinking something like this: > > > > Scan scan = new Scan(startRow, endRow); > > > > scan.setCaching(someVal); // based on what we expect most rows to take > for > > processing time > > > > ResultScanner scanner = table.getScanner(scan); > > > > for (Result r : scanner) { > > > > // usual processing, the time for which we accounted for in our caching > > and global lease timeout settings > > > > if (someCondition) { > > > > // More time-intensive processing necessary on this record, which is > > hard to account for in the caching > > > > scanner.progress(); > > > > } > > > > } > > > > > > -- > > > > I'm not sure how we could expose this in the context of a hadoop job, > since > > I don't believe we have access to the underlying scanner, but that would > be > > great also. > > > > > > On Wed, Mar 20, 2013 at 1:11 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > bq. if HBase provided a way to manually refresh a lease similar to > > > Hadoop's context.progress() > > > > > > Can you outline how the above works for long scan ? > > > > > > bq. Even being able to override the timeout on a per-scan basis would > be > > > nice. > > > > > > Agreed. > > > > > > On Wed, Mar 20, 2013 at 10:05 AM, Bryan Beaudreault < > > > [EMAIL PROTECTED]> wrote: > > > > > > > Typically it is better to use caching and batch size to limit the > > number > > > of > > > > rows returned and thus the amount of processing required between > calls > > to > > > > next() during a scan, but it would be nice if HBase provided a way to > > > > manually refresh a lease similar to Hadoop's context.progress(). In > a > > > > cluster that is used for many different applications, upping the > global > > > > lease timeout is a heavy handed solution. Even being able to > override > > > the > > > > timeout on a per-scan basis would be nice. > > > > > > > > Thoughts on that, Ted? > > > > > > > > > > > > On Wed, Mar 20, 2013 at 1:00 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > > > > > In 0.94, there is only one setting. > > > > > See release notes of HBASE-6170 which is in 0.95 > > > > > > > > > > Looks like this should help (in 0.95): > > > > > > > > > > https://issues.apache.org/jira/browse/HBASE-2214> > > > > Do HBASE-1996 -- setting size to return in scan rather than count > of > > > rows > > > > > -- properly > > > > > > > > > > From your description, you should be able to raise the timeout > since > > > the > > > > > writes are relatively fast. > > > > > > > > > > Cheers > > > > > > > > > > On Wed, Mar 20, 2013 at 9:32 AM, Dan Crosta <[EMAIL PROTECTED]> > > wrote: > > > > > > > > > > > I'm confused -- I only see one setting in CDH manager, what is > the > > > name > > > > > of > > > > > > the other setting? > > > > > > > > > > > > Our load is moderately frequent small writes (in batches of 1000 > > > cells > > > > at > > > > > > a time, typically split over a few hundred rows -- these complete > > > very > > > > > > fast, we haven't seen any timeouts there), and infrequent batches > > of > > > > > large > > > > > > reads (scans), which is where we do see timeouts. My guess is > that > > > the > > > > > > timeout is more due to our application taking some time -- > > apparently > > > > > more > > > > > > than 60s -- to process the results of each scan's output, rather > > than > > > > due > > > > > > to slowness in HBase itself, which tends to be only moderately > > loaded > > > > > > (judging by CPU, network, and disk) while we do the reads. > > > > > > > > > > > > Thanks, > > > > > > - Dan > > > > > > > > > > > > On Mar 17, 2013, at 2:20 PM, Ted Yu wrote: > > > > > > > > > > > > > The lease timeout is used by row locking too.
|
|