Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Scanner timeout -- any reason not to raise?


Copy link to this message
-
Re: Scanner timeout -- any reason not to raise?
I was thinking something like this:

Scan scan = new Scan(startRow, endRow);

scan.setCaching(someVal); // based on what we expect most rows to take for
processing time

 ResultScanner scanner = table.getScanner(scan);

  for (Result r : scanner) {

  // usual processing, the time for which we accounted for in our caching
and global lease timeout settings

  if (someCondition) {

    // More time-intensive processing necessary on this record, which is
hard to account for in the caching

    scanner.progress();

  }

 }
--

I'm not sure how we could expose this in the context of a hadoop job, since
I don't believe we have access to the underlying scanner, but that would be
great also.
On Wed, Mar 20, 2013 at 1:11 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> bq.  if HBase provided a way to manually refresh a lease similar to
> Hadoop's context.progress()
>
> Can you outline how the above works for long scan ?
>
> bq. Even being able to override the timeout on a per-scan basis would be
> nice.
>
> Agreed.
>
> On Wed, Mar 20, 2013 at 10:05 AM, Bryan Beaudreault <
> [EMAIL PROTECTED]> wrote:
>
> > Typically it is better to use caching and batch size to limit the number
> of
> > rows returned and thus the amount of processing required between calls to
> > next() during a scan, but it would be nice if HBase provided a way to
> > manually refresh a lease similar to Hadoop's context.progress().  In a
> > cluster that is used for many different applications, upping the global
> > lease timeout is a heavy handed solution.  Even being able to override
> the
> > timeout on a per-scan basis would be nice.
> >
> > Thoughts on that, Ted?
> >
> >
> > On Wed, Mar 20, 2013 at 1:00 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >
> > > In 0.94, there is only one setting.
> > > See release notes of HBASE-6170 which is in 0.95
> > >
> > > Looks like this should help (in 0.95):
> > >
> > > https://issues.apache.org/jira/browse/HBASE-2214
> > > Do HBASE-1996 -- setting size to return in scan rather than count of
> rows
> > > -- properly
> > >
> > > From your description, you should be able to raise the timeout since
> the
> > > writes are relatively fast.
> > >
> > > Cheers
> > >
> > > On Wed, Mar 20, 2013 at 9:32 AM, Dan Crosta <[EMAIL PROTECTED]> wrote:
> > >
> > > > I'm confused -- I only see one setting in CDH manager, what is the
> name
> > > of
> > > > the other setting?
> > > >
> > > > Our load is moderately frequent small writes (in batches of 1000
> cells
> > at
> > > > a time, typically split over a few hundred rows -- these complete
> very
> > > > fast, we haven't seen any timeouts there), and infrequent batches of
> > > large
> > > > reads (scans), which is where we do see timeouts. My guess is that
> the
> > > > timeout is more due to our application taking some time -- apparently
> > > more
> > > > than 60s -- to process the results of each scan's output, rather than
> > due
> > > > to slowness in HBase itself, which tends to be only moderately loaded
> > > > (judging by CPU, network, and disk) while we do the reads.
> > > >
> > > > Thanks,
> > > > - Dan
> > > >
> > > > On Mar 17, 2013, at 2:20 PM, Ted Yu wrote:
> > > >
> > > > > The lease timeout is used by row locking too.
> > > > > That's the reason behind splitting the setting into two config
> > > > parameters.
> > > > >
> > > > > How is your load composition ? Do you mostly serve reads from
> HBase ?
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Sun, Mar 17, 2013 at 1:56 PM, Dan Crosta <[EMAIL PROTECTED]>
> > wrote:
> > > > >
> > > > >> Ah, thanks Ted -- I was wondering what that setting was for.
> > > > >>
> > > > >> We are using CDH 4.2.0, which is HBase 0.94.2 (give or take a few
> > > > >> backports from 0.94.3).
> > > > >>
> > > > >> Is there any harm in setting the lease timeout to something
> larger,
> > > > like 5
> > > > >> or 10 minutes?
> > > > >>
> > > > >> Thanks,
> > > > >> - Dan
> > > > >>
> > > > >> On Mar 17, 2013, at 1:46 PM, Ted Yu wrote:
> > > > >>
> > >
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB