Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Scanner timeout -- any reason not to raise?


+
Dan Crosta 2013-03-17, 18:46
+
Ted Yu 2013-03-17, 20:46
+
Dan Crosta 2013-03-17, 20:56
+
Ted Yu 2013-03-17, 21:20
+
Dan Crosta 2013-03-20, 16:32
+
Ted Yu 2013-03-20, 17:00
+
Bryan Beaudreault 2013-03-20, 17:05
+
Ted Yu 2013-03-20, 17:11
+
Bryan Beaudreault 2013-03-20, 17:39
+
Ted Yu 2013-03-20, 17:56
Copy link to this message
-
Re: Scanner timeout -- any reason not to raise?
Thanks Ted, I've submitted https://issues.apache.org/jira/browse/HBASE-8157.

On Wed, Mar 20, 2013 at 1:56 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> Bryan:
> Interesting idea.
>
> You can log a JIRA with the following two suggestions.
>
> On Wed, Mar 20, 2013 at 10:39 AM, Bryan Beaudreault <
> [EMAIL PROTECTED]> wrote:
>
> > I was thinking something like this:
> >
> > Scan scan = new Scan(startRow, endRow);
> >
> > scan.setCaching(someVal); // based on what we expect most rows to take
> for
> > processing time
> >
> >  ResultScanner scanner = table.getScanner(scan);
> >
> >   for (Result r : scanner) {
> >
> >   // usual processing, the time for which we accounted for in our caching
> > and global lease timeout settings
> >
> >   if (someCondition) {
> >
> >     // More time-intensive processing necessary on this record, which is
> > hard to account for in the caching
> >
> >     scanner.progress();
> >
> >   }
> >
> >  }
> >
> >
> > --
> >
> > I'm not sure how we could expose this in the context of a hadoop job,
> since
> > I don't believe we have access to the underlying scanner, but that would
> be
> > great also.
> >
> >
> > On Wed, Mar 20, 2013 at 1:11 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >
> > > bq.  if HBase provided a way to manually refresh a lease similar to
> > > Hadoop's context.progress()
> > >
> > > Can you outline how the above works for long scan ?
> > >
> > > bq. Even being able to override the timeout on a per-scan basis would
> be
> > > nice.
> > >
> > > Agreed.
> > >
> > > On Wed, Mar 20, 2013 at 10:05 AM, Bryan Beaudreault <
> > > [EMAIL PROTECTED]> wrote:
> > >
> > > > Typically it is better to use caching and batch size to limit the
> > number
> > > of
> > > > rows returned and thus the amount of processing required between
> calls
> > to
> > > > next() during a scan, but it would be nice if HBase provided a way to
> > > > manually refresh a lease similar to Hadoop's context.progress().  In
> a
> > > > cluster that is used for many different applications, upping the
> global
> > > > lease timeout is a heavy handed solution.  Even being able to
> override
> > > the
> > > > timeout on a per-scan basis would be nice.
> > > >
> > > > Thoughts on that, Ted?
> > > >
> > > >
> > > > On Wed, Mar 20, 2013 at 1:00 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > In 0.94, there is only one setting.
> > > > > See release notes of HBASE-6170 which is in 0.95
> > > > >
> > > > > Looks like this should help (in 0.95):
> > > > >
> > > > > https://issues.apache.org/jira/browse/HBASE-2214
> > > > > Do HBASE-1996 -- setting size to return in scan rather than count
> of
> > > rows
> > > > > -- properly
> > > > >
> > > > > From your description, you should be able to raise the timeout
> since
> > > the
> > > > > writes are relatively fast.
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Wed, Mar 20, 2013 at 9:32 AM, Dan Crosta <[EMAIL PROTECTED]>
> > wrote:
> > > > >
> > > > > > I'm confused -- I only see one setting in CDH manager, what is
> the
> > > name
> > > > > of
> > > > > > the other setting?
> > > > > >
> > > > > > Our load is moderately frequent small writes (in batches of 1000
> > > cells
> > > > at
> > > > > > a time, typically split over a few hundred rows -- these complete
> > > very
> > > > > > fast, we haven't seen any timeouts there), and infrequent batches
> > of
> > > > > large
> > > > > > reads (scans), which is where we do see timeouts. My guess is
> that
> > > the
> > > > > > timeout is more due to our application taking some time --
> > apparently
> > > > > more
> > > > > > than 60s -- to process the results of each scan's output, rather
> > than
> > > > due
> > > > > > to slowness in HBase itself, which tends to be only moderately
> > loaded
> > > > > > (judging by CPU, network, and disk) while we do the reads.
> > > > > >
> > > > > > Thanks,
> > > > > > - Dan
> > > > > >
> > > > > > On Mar 17, 2013, at 2:20 PM, Ted Yu wrote:
> > > > > >
> > > > > > > The lease timeout is used by row locking too.