Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Scanner timeout -- any reason not to raise?


+
Dan Crosta 2013-03-17, 18:46
+
Ted Yu 2013-03-17, 20:46
+
Dan Crosta 2013-03-17, 20:56
+
Ted Yu 2013-03-17, 21:20
+
Dan Crosta 2013-03-20, 16:32
+
Ted Yu 2013-03-20, 17:00
+
Bryan Beaudreault 2013-03-20, 17:05
+
Ted Yu 2013-03-20, 17:11
+
Bryan Beaudreault 2013-03-20, 17:39
+
Ted Yu 2013-03-20, 17:56
Copy link to this message
-
Re: Scanner timeout -- any reason not to raise?
Thanks Ted, I've submitted https://issues.apache.org/jira/browse/HBASE-8157.

On Wed, Mar 20, 2013 at 1:56 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> Bryan:
> Interesting idea.
>
> You can log a JIRA with the following two suggestions.
>
> On Wed, Mar 20, 2013 at 10:39 AM, Bryan Beaudreault <
> [EMAIL PROTECTED]> wrote:
>
> > I was thinking something like this:
> >
> > Scan scan = new Scan(startRow, endRow);
> >
> > scan.setCaching(someVal); // based on what we expect most rows to take
> for
> > processing time
> >
> >  ResultScanner scanner = table.getScanner(scan);
> >
> >   for (Result r : scanner) {
> >
> >   // usual processing, the time for which we accounted for in our caching
> > and global lease timeout settings
> >
> >   if (someCondition) {
> >
> >     // More time-intensive processing necessary on this record, which is
> > hard to account for in the caching
> >
> >     scanner.progress();
> >
> >   }
> >
> >  }
> >
> >
> > --
> >
> > I'm not sure how we could expose this in the context of a hadoop job,
> since
> > I don't believe we have access to the underlying scanner, but that would
> be
> > great also.
> >
> >
> > On Wed, Mar 20, 2013 at 1:11 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >
> > > bq.  if HBase provided a way to manually refresh a lease similar to
> > > Hadoop's context.progress()
> > >
> > > Can you outline how the above works for long scan ?
> > >
> > > bq. Even being able to override the timeout on a per-scan basis would
> be
> > > nice.
> > >
> > > Agreed.
> > >
> > > On Wed, Mar 20, 2013 at 10:05 AM, Bryan Beaudreault <
> > > [EMAIL PROTECTED]> wrote:
> > >
> > > > Typically it is better to use caching and batch size to limit the
> > number
> > > of
> > > > rows returned and thus the amount of processing required between
> calls
> > to
> > > > next() during a scan, but it would be nice if HBase provided a way to
> > > > manually refresh a lease similar to Hadoop's context.progress().  In
> a
> > > > cluster that is used for many different applications, upping the
> global
> > > > lease timeout is a heavy handed solution.  Even being able to
> override
> > > the
> > > > timeout on a per-scan basis would be nice.
> > > >
> > > > Thoughts on that, Ted?
> > > >
> > > >
> > > > On Wed, Mar 20, 2013 at 1:00 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > In 0.94, there is only one setting.
> > > > > See release notes of HBASE-6170 which is in 0.95
> > > > >
> > > > > Looks like this should help (in 0.95):
> > > > >
> > > > > https://issues.apache.org/jira/browse/HBASE-2214
> > > > > Do HBASE-1996 -- setting size to return in scan rather than count
> of
> > > rows
> > > > > -- properly
> > > > >
> > > > > From your description, you should be able to raise the timeout
> since
> > > the
> > > > > writes are relatively fast.
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Wed, Mar 20, 2013 at 9:32 AM, Dan Crosta <[EMAIL PROTECTED]>
> > wrote:
> > > > >
> > > > > > I'm confused -- I only see one setting in CDH manager, what is
> the
> > > name
> > > > > of
> > > > > > the other setting?
> > > > > >
> > > > > > Our load is moderately frequent small writes (in batches of 1000
> > > cells
> > > > at
> > > > > > a time, typically split over a few hundred rows -- these complete
> > > very
> > > > > > fast, we haven't seen any timeouts there), and infrequent batches
> > of
> > > > > large
> > > > > > reads (scans), which is where we do see timeouts. My guess is
> that
> > > the
> > > > > > timeout is more due to our application taking some time --
> > apparently
> > > > > more
> > > > > > than 60s -- to process the results of each scan's output, rather
> > than
> > > > due
> > > > > > to slowness in HBase itself, which tends to be only moderately
> > loaded
> > > > > > (judging by CPU, network, and disk) while we do the reads.
> > > > > >
> > > > > > Thanks,
> > > > > > - Dan
> > > > > >
> > > > > > On Mar 17, 2013, at 2:20 PM, Ted Yu wrote:
> > > > > >
> > > > > > > The lease timeout is used by row locking too.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB