Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Scanner timeout -- any reason not to raise?


Copy link to this message
-
Re: Scanner timeout -- any reason not to raise?
Bryan:
Interesting idea.

You can log a JIRA with the following two suggestions.

On Wed, Mar 20, 2013 at 10:39 AM, Bryan Beaudreault <
[EMAIL PROTECTED]> wrote:

> I was thinking something like this:
>
> Scan scan = new Scan(startRow, endRow);
>
> scan.setCaching(someVal); // based on what we expect most rows to take for
> processing time
>
>  ResultScanner scanner = table.getScanner(scan);
>
>   for (Result r : scanner) {
>
>   // usual processing, the time for which we accounted for in our caching
> and global lease timeout settings
>
>   if (someCondition) {
>
>     // More time-intensive processing necessary on this record, which is
> hard to account for in the caching
>
>     scanner.progress();
>
>   }
>
>  }
>
>
> --
>
> I'm not sure how we could expose this in the context of a hadoop job, since
> I don't believe we have access to the underlying scanner, but that would be
> great also.
>
>
> On Wed, Mar 20, 2013 at 1:11 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > bq.  if HBase provided a way to manually refresh a lease similar to
> > Hadoop's context.progress()
> >
> > Can you outline how the above works for long scan ?
> >
> > bq. Even being able to override the timeout on a per-scan basis would be
> > nice.
> >
> > Agreed.
> >
> > On Wed, Mar 20, 2013 at 10:05 AM, Bryan Beaudreault <
> > [EMAIL PROTECTED]> wrote:
> >
> > > Typically it is better to use caching and batch size to limit the
> number
> > of
> > > rows returned and thus the amount of processing required between calls
> to
> > > next() during a scan, but it would be nice if HBase provided a way to
> > > manually refresh a lease similar to Hadoop's context.progress().  In a
> > > cluster that is used for many different applications, upping the global
> > > lease timeout is a heavy handed solution.  Even being able to override
> > the
> > > timeout on a per-scan basis would be nice.
> > >
> > > Thoughts on that, Ted?
> > >
> > >
> > > On Wed, Mar 20, 2013 at 1:00 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> > >
> > > > In 0.94, there is only one setting.
> > > > See release notes of HBASE-6170 which is in 0.95
> > > >
> > > > Looks like this should help (in 0.95):
> > > >
> > > > https://issues.apache.org/jira/browse/HBASE-2214
> > > > Do HBASE-1996 -- setting size to return in scan rather than count of
> > rows
> > > > -- properly
> > > >
> > > > From your description, you should be able to raise the timeout since
> > the
> > > > writes are relatively fast.
> > > >
> > > > Cheers
> > > >
> > > > On Wed, Mar 20, 2013 at 9:32 AM, Dan Crosta <[EMAIL PROTECTED]>
> wrote:
> > > >
> > > > > I'm confused -- I only see one setting in CDH manager, what is the
> > name
> > > > of
> > > > > the other setting?
> > > > >
> > > > > Our load is moderately frequent small writes (in batches of 1000
> > cells
> > > at
> > > > > a time, typically split over a few hundred rows -- these complete
> > very
> > > > > fast, we haven't seen any timeouts there), and infrequent batches
> of
> > > > large
> > > > > reads (scans), which is where we do see timeouts. My guess is that
> > the
> > > > > timeout is more due to our application taking some time --
> apparently
> > > > more
> > > > > than 60s -- to process the results of each scan's output, rather
> than
> > > due
> > > > > to slowness in HBase itself, which tends to be only moderately
> loaded
> > > > > (judging by CPU, network, and disk) while we do the reads.
> > > > >
> > > > > Thanks,
> > > > > - Dan
> > > > >
> > > > > On Mar 17, 2013, at 2:20 PM, Ted Yu wrote:
> > > > >
> > > > > > The lease timeout is used by row locking too.
> > > > > > That's the reason behind splitting the setting into two config
> > > > > parameters.
> > > > > >
> > > > > > How is your load composition ? Do you mostly serve reads from
> > HBase ?
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > > On Sun, Mar 17, 2013 at 1:56 PM, Dan Crosta <[EMAIL PROTECTED]>
> > > wrote:
> > > > > >
> > > > > >> Ah, thanks Ted -- I was wondering what that setting was for.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB