Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Scanner timeout -- any reason not to raise?


Copy link to this message
-
Re: Scanner timeout -- any reason not to raise?
bq.  if HBase provided a way to manually refresh a lease similar to
Hadoop's context.progress()

Can you outline how the above works for long scan ?

bq. Even being able to override the timeout on a per-scan basis would be
nice.

Agreed.

On Wed, Mar 20, 2013 at 10:05 AM, Bryan Beaudreault <
[EMAIL PROTECTED]> wrote:

> Typically it is better to use caching and batch size to limit the number of
> rows returned and thus the amount of processing required between calls to
> next() during a scan, but it would be nice if HBase provided a way to
> manually refresh a lease similar to Hadoop's context.progress().  In a
> cluster that is used for many different applications, upping the global
> lease timeout is a heavy handed solution.  Even being able to override the
> timeout on a per-scan basis would be nice.
>
> Thoughts on that, Ted?
>
>
> On Wed, Mar 20, 2013 at 1:00 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > In 0.94, there is only one setting.
> > See release notes of HBASE-6170 which is in 0.95
> >
> > Looks like this should help (in 0.95):
> >
> > https://issues.apache.org/jira/browse/HBASE-2214
> > Do HBASE-1996 -- setting size to return in scan rather than count of rows
> > -- properly
> >
> > From your description, you should be able to raise the timeout since the
> > writes are relatively fast.
> >
> > Cheers
> >
> > On Wed, Mar 20, 2013 at 9:32 AM, Dan Crosta <[EMAIL PROTECTED]> wrote:
> >
> > > I'm confused -- I only see one setting in CDH manager, what is the name
> > of
> > > the other setting?
> > >
> > > Our load is moderately frequent small writes (in batches of 1000 cells
> at
> > > a time, typically split over a few hundred rows -- these complete very
> > > fast, we haven't seen any timeouts there), and infrequent batches of
> > large
> > > reads (scans), which is where we do see timeouts. My guess is that the
> > > timeout is more due to our application taking some time -- apparently
> > more
> > > than 60s -- to process the results of each scan's output, rather than
> due
> > > to slowness in HBase itself, which tends to be only moderately loaded
> > > (judging by CPU, network, and disk) while we do the reads.
> > >
> > > Thanks,
> > > - Dan
> > >
> > > On Mar 17, 2013, at 2:20 PM, Ted Yu wrote:
> > >
> > > > The lease timeout is used by row locking too.
> > > > That's the reason behind splitting the setting into two config
> > > parameters.
> > > >
> > > > How is your load composition ? Do you mostly serve reads from HBase ?
> > > >
> > > > Cheers
> > > >
> > > > On Sun, Mar 17, 2013 at 1:56 PM, Dan Crosta <[EMAIL PROTECTED]>
> wrote:
> > > >
> > > >> Ah, thanks Ted -- I was wondering what that setting was for.
> > > >>
> > > >> We are using CDH 4.2.0, which is HBase 0.94.2 (give or take a few
> > > >> backports from 0.94.3).
> > > >>
> > > >> Is there any harm in setting the lease timeout to something larger,
> > > like 5
> > > >> or 10 minutes?
> > > >>
> > > >> Thanks,
> > > >> - Dan
> > > >>
> > > >> On Mar 17, 2013, at 1:46 PM, Ted Yu wrote:
> > > >>
> > > >>> Which HBase version are you using ?
> > > >>>
> > > >>> In 0.94 and prior, the config param is
> > hbase.regionserver.lease.period
> > > >>>
> > > >>> In 0.95, it is different. See release notes of HBASE-6170
> > > >>>
> > > >>> On Sun, Mar 17, 2013 at 11:46 AM, Dan Crosta <[EMAIL PROTECTED]>
> > wrote:
> > > >>>
> > > >>>> We occasionally get scanner timeout errors such as "66698ms passed
> > > since
> > > >>>> the last invocation, timeout is currently set to 60000" when
> > > iterating a
> > > >>>> scanner through the Thrift API. Is there any reason not to raise
> the
> > > >>>> timeout to something larger than the default 60s? Put another way,
> > > what
> > > >>>> resources (and how much of them) does a scanner take up on a
> thrift
> > > >> server
> > > >>>> or region server?
> > > >>>>
> > > >>>> Also, to confirm -- I believe "hbase.rpc.timeout" is the setting
> in
> > > >>>> question here, but someone please correct me if I'm wrong.
> > > >>>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB