Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Scanner timeout -- any reason not to raise?


Copy link to this message
-
Re: Scanner timeout -- any reason not to raise?
Bryan Beaudreault 2013-03-20, 17:05
Typically it is better to use caching and batch size to limit the number of
rows returned and thus the amount of processing required between calls to
next() during a scan, but it would be nice if HBase provided a way to
manually refresh a lease similar to Hadoop's context.progress().  In a
cluster that is used for many different applications, upping the global
lease timeout is a heavy handed solution.  Even being able to override the
timeout on a per-scan basis would be nice.

Thoughts on that, Ted?
On Wed, Mar 20, 2013 at 1:00 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> In 0.94, there is only one setting.
> See release notes of HBASE-6170 which is in 0.95
>
> Looks like this should help (in 0.95):
>
> https://issues.apache.org/jira/browse/HBASE-2214
> Do HBASE-1996 -- setting size to return in scan rather than count of rows
> -- properly
>
> From your description, you should be able to raise the timeout since the
> writes are relatively fast.
>
> Cheers
>
> On Wed, Mar 20, 2013 at 9:32 AM, Dan Crosta <[EMAIL PROTECTED]> wrote:
>
> > I'm confused -- I only see one setting in CDH manager, what is the name
> of
> > the other setting?
> >
> > Our load is moderately frequent small writes (in batches of 1000 cells at
> > a time, typically split over a few hundred rows -- these complete very
> > fast, we haven't seen any timeouts there), and infrequent batches of
> large
> > reads (scans), which is where we do see timeouts. My guess is that the
> > timeout is more due to our application taking some time -- apparently
> more
> > than 60s -- to process the results of each scan's output, rather than due
> > to slowness in HBase itself, which tends to be only moderately loaded
> > (judging by CPU, network, and disk) while we do the reads.
> >
> > Thanks,
> > - Dan
> >
> > On Mar 17, 2013, at 2:20 PM, Ted Yu wrote:
> >
> > > The lease timeout is used by row locking too.
> > > That's the reason behind splitting the setting into two config
> > parameters.
> > >
> > > How is your load composition ? Do you mostly serve reads from HBase ?
> > >
> > > Cheers
> > >
> > > On Sun, Mar 17, 2013 at 1:56 PM, Dan Crosta <[EMAIL PROTECTED]> wrote:
> > >
> > >> Ah, thanks Ted -- I was wondering what that setting was for.
> > >>
> > >> We are using CDH 4.2.0, which is HBase 0.94.2 (give or take a few
> > >> backports from 0.94.3).
> > >>
> > >> Is there any harm in setting the lease timeout to something larger,
> > like 5
> > >> or 10 minutes?
> > >>
> > >> Thanks,
> > >> - Dan
> > >>
> > >> On Mar 17, 2013, at 1:46 PM, Ted Yu wrote:
> > >>
> > >>> Which HBase version are you using ?
> > >>>
> > >>> In 0.94 and prior, the config param is
> hbase.regionserver.lease.period
> > >>>
> > >>> In 0.95, it is different. See release notes of HBASE-6170
> > >>>
> > >>> On Sun, Mar 17, 2013 at 11:46 AM, Dan Crosta <[EMAIL PROTECTED]>
> wrote:
> > >>>
> > >>>> We occasionally get scanner timeout errors such as "66698ms passed
> > since
> > >>>> the last invocation, timeout is currently set to 60000" when
> > iterating a
> > >>>> scanner through the Thrift API. Is there any reason not to raise the
> > >>>> timeout to something larger than the default 60s? Put another way,
> > what
> > >>>> resources (and how much of them) does a scanner take up on a thrift
> > >> server
> > >>>> or region server?
> > >>>>
> > >>>> Also, to confirm -- I believe "hbase.rpc.timeout" is the setting in
> > >>>> question here, but someone please correct me if I'm wrong.
> > >>>>
> > >>>> Thanks,
> > >>>> - Dan
> > >>>>
> > >>>>
> > >>>>
> > >>
> > >>
> >
> >
>