Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Many scanner opening


+
Eugeny Morozov 2012-12-18, 08:01
+
lars hofhansl 2012-12-19, 02:23
+
Eugeny Morozov 2012-12-20, 10:32
+
lars hofhansl 2012-12-20, 18:51
Copy link to this message
-
Re: Many scanner opening
Lars,

We tried, but I didn't know there is such a contention issue.
We have two different column families. First one contains data, that are
partially used as a filter. And actual data lives in  second column family.

So, outer scanner (the first one) goes through the table and filter out
keys that contain required data. Then, these keys are moved to the inner
(second) scanner.
BTW, second scanner utilizes FuzzyRowFilter:
http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/

We have pretty small cluster - only 18 mappers, but looks like it's enough
to get contention =)
On Thu, Dec 20, 2012 at 10:51 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Cool.
>
> You probably made it less likely that your scanners will scan the same
> HFile in parallel.
>
> -- Lars
>
>
>
> ________________________________
>  From: Eugeny Morozov <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
> Sent: Thursday, December 20, 2012 2:32 AM
> Subject: Re: Many scanner opening
>
> Lars,
>
> Cool stuff! Thanks a lot! I'm not sure I can apply the patch, cause we're
> using CDH-4.1.1, but increasing size of internal scanner does the trick -
> decreased number of scanners.
> At least temporarily it's good enough.
>
> Thanks!
>
> On Wed, Dec 19, 2012 at 6:23 AM, lars hofhansl <[EMAIL PROTECTED]>
> wrote:
>
> > You might have run into HBASE-7336.
> > (Not available in any official release, yet)
> >
> > If you're using 0.94 (and probably 0.92) you can just apply this patch
> > (it's save and simple).
> >
> >
> >
> > ________________________________
> >  From: Eugeny Morozov <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]
> > Sent: Tuesday, December 18, 2012 12:01 AM
> > Subject: Many scanner opening
> >
> > Hello!
> >
> > We faced an issue recently that the more map tasks are completed, the
> > longer it takes to complete one more map task.
> >
> > In our architecture we have two scanners to read the table. The first
> one,
> > which is called 'outer' scanner is reading table and filter some rowkeys.
> > These rowkeys are used as a filter for second scanner - 'internal'. Thus
> we
> > constantly open 'internal' scanner with different filters.
> >
> > As an additional symptoms we see that our cluster practically does
> nothing
> > - there is no CPU loading, no disk loading, no network, etc. Most of the
> > time it means we are waiting on some locks, but I'm not sure.
> >
> > I would appreciate any ideas or suggestions to understand the case.
> > Thank you in advance.
> > --
> > Evgeny Morozov
> > Developer Grid Dynamics
> > Skype: morozov.evgeny
> > www.griddynamics.com
> > [EMAIL PROTECTED]
> >
>
>
>
> --
> Evgeny Morozov
> Developer Grid Dynamics
> Skype: morozov.evgeny
> www.griddynamics.com
> [EMAIL PROTECTED]
>

--
Evgeny Morozov
Developer Grid Dynamics
Skype: morozov.evgeny
www.griddynamics.com
[EMAIL PROTECTED]
+
Michael Segel 2012-12-20, 13:14