Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> A possible bug in the scanner.


+
Vidhyashankar Venkatarama... 2011-04-13, 07:40
+
Ted Yu 2011-04-13, 08:44
+
Vidhyashankar Venkatarama... 2011-04-13, 14:44
+
Gary Helmling 2011-04-13, 16:15
+
Jean-Daniel Cryans 2011-04-13, 16:42
+
Gary Helmling 2011-04-13, 17:05
+
Jean-Daniel Cryans 2011-04-13, 17:18
+
Vidhyashankar Venkatarama... 2011-04-13, 17:03
+
Gary Helmling 2011-04-13, 17:14
+
Himanshu Vashishtha 2011-04-13, 15:43
+
Vidhyashankar Venkatarama... 2011-04-13, 17:47
Copy link to this message
-
Re: A possible bug in the scanner.
Vidhya, so yes in the case of huge files with valid rows, timerange thing
will not be effective and neither in the case of a scanner hanging in its
next calls either by a gc pause or some exhaustive computation. I voted for
this answer after reading your initial mail (but it got posted after a delay
of 3 hrs, don't know why) and lot of other facts were revealed during that
time frame :)), like jira 2077.
Good learning for me though :)

Thanks,
Himanshu

On Wed, Apr 13, 2011 at 11:47 AM, Vidhyashankar Venkataraman <
[EMAIL PROTECTED]> wrote:

> Himanshu,
>   Thanks, this will resolve the particular case we ran into. But what if
> the files are huge and have a wide range of timestamps and only some of the
> records in the file are valid? And for the other application that we have:
> scans with filters that returns a sparse set, the solution may not help.
>
>   Further, it won't solve the underlying problem. When a scanner is busy,
> but doesn't have any rows to return "yet", neither the client nor the region
> server should mistake it for an unresponsive scanner.
>
> V
>
> On 4/13/11 8:43 AM, "Himanshu Vashishtha" <[EMAIL PROTECTED]> wrote:
>
> Vidhya,
> Did you try setting scanner time range. It takes min and max timestamps,
> and
> when instantiating the scanner  at RS, a time based filtering is done to
> include only selected store files. Have a look at
> StoreFile.shouldseek(Scan,
> Sortedset<byte[]). I think it should improve the response time.
>
> Himanshu
>
> On Wed, Apr 13, 2011 at 8:44 AM, Vidhyashankar Venkataraman <
> [EMAIL PROTECTED]> wrote:
>
> > Hi
> >   We had enabled scanner caching but I don't think it is the same issue
> > because scanner.next in this case is blocking: the scanner is busy in the
> > region server but hasn't returned anything yet since a row to be returned
> > hasn't been found yet (all rows have expired but are still there since
> they
> > havent been compacted yet).
> >
> > Vidhya
> >
> > On 4/13/11 1:44 AM, "Ted Yu" <[EMAIL PROTECTED]> wrote:
> >
> > Have you read the following thread ?
> > "ScannerTimeoutException when a scan enables caching, no exception when
> it
> > doesn't"Did you enable caching ? If not, it is different issue.
> >
> > On Wed, Apr 13, 2011 at 12:40 AM, Vidhyashankar Venkataraman <
> > [EMAIL PROTECTED]> wrote:
> >
> > > (This could be a known issue. Please let me know if it is).
> > >
> > > We had a set of uncompacted store files in a region. One of the column
> > > families had a store file of 5 Gigs. The other column families were
> > pretty
> > > small (a few megabytes at most).
> > >
> > >  It so turned out that all these files had rows whose TTL had expired.
> > Now
> > > when this region was scanned (which should yield a result of a null
> set),
> > we
> > > got Scanner timeouts and UnknownScannerExceptions.
> > >
> > > And when we tried scanning the region without the large column family,
> > the
> > > scanner returned back safely with no result.
> > >
> > > So, I major compacted it and the scan started working correctly.
> > >
> > > So it looks like timeouts happen if the scanner does not return any
> > output
> > > for a specified time.
> > > Which isn't exactly the correct thing to do, because it could be the
> case
> > > that the scanner was indeed busy but it just so happened that there are
> > no
> > > rows yet to return back to the client.
> > >
> > > We can try increasing the scanner timeout, but this doesn't resolve the
> > > underlying problem. Is this a know issue?
> > >
> > > Thank you
> > > Vidhya
> > >
> >
> >
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB