Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Filter with State


Copy link to this message
-
Re: Filter with State
Hi Lars:

That is useful. I appreciate it. The idea about cross row transaction is an
interesting one.

Can I have an iterator on the client side that get rows from a coprocessor?
(i.e. Filtered rows are streamed into the client application and client can
access them via iterator)

Best Regards,

Jerry
On Thu, Aug 2, 2012 at 12:13 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> The Filter is initialized per Region as part of a RegionScannerImpl.
>
> So as long as all the rows you are interested are co-located in the same
> region you can keep that state in the Filter instance.
>
> You can use a custom RegionSplitPolicy to control (to some extend at
> least) how the rows are colocated (KeyPrefixRegionSplitPolicy is an
> example).
>
> I also blogged about this here (in the context of cross row transactions):
> http://hadoop-hbase.blogspot.com/2012/02/limited-cross-row-transactions-in-hbase.html
>
>
> Maybe what you really are looking for are coprocessors?
>
>
> -- Lars
>
>
>
> ----- Original Message -----
> From: Jerry Lam <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Cc:
> Sent: Wednesday, August 1, 2012 7:06 PM
> Subject: Re: Filter with State
>
> Hi Lars,
>
> I understand that it is more difficult to carry states across
> regions/servers, how about in a single region? Knowing that the rows in a
> single region have dependencies, can we have filter with state? If filter
> doesn't provide this ability, is there other mechanism in hbase to offer
> this kind of functionalities?
>
> I think this is a good feature because it allows efficient scanning on
> dependent rows. Instead of fetching each row to the client side and check
> if we should fetch the next row, the filter on the server side handles this
> logic.
>
> Best Regards,
>
> Jerry
>
> Sent from my iPad (sorry for spelling mistakes)
>
> On 2012-08-01, at 21:52, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
> > The issue here is that different rows can be located in different
> regions or even different region servers, so no local state will carry over
> all rows.
> >
> >
> >
> > ----- Original Message -----
> > From: Jerry Lam <[EMAIL PROTECTED]>
> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> > Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> > Sent: Wednesday, August 1, 2012 5:48 PM
> > Subject: Re: Filter with State
> >
> > Hi St.Ack:
> >
> > Schema cannot be changed to a single row.
> > The API describes "Do not rely on filters carrying state across rows;
> its not reliable in current hbase as we have no handlers in place for when
> regions split, close or server crashes." If we manage region splitting
> ourselves, so the split issue doesn't apply. Other failures can be handled
> on the application level. Does each invocation of scanner.next instantiate
> a new filter at the server side even on the same region (I.e. Does scanning
> on the same region use the same filter or different filter depending on the
> scanner.next calls??)
> >
> > Best Regards,
> >
> > Jerry
> >
> > Sent from my iPad (sorry for spelling mistakes)
> >
> > On 2012-08-01, at 18:44, Stack <[EMAIL PROTECTED]> wrote:
> >
> >> On Wed, Aug 1, 2012 at 10:44 PM, Jerry Lam <[EMAIL PROTECTED]>
> wrote:
> >>> Hi HBase guru:
> >>>
> >>> From Lars George talk, he mentions that filter has no state. What if I
> need
> >>> to scan rows in which the decision to filter one row or not is based
> on the
> >>> previous row's column values? Any idea how one can implement this type
> of
> >>> logic?
> >>
> >> You could try carrying state in the client (but if client dies state
> dies).
> >>
> >> You can't have scanners carry state across rows.  It says so in API
> >>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/package-summary.html#package_description
> >> (Whatever about the API, if LarsG says it, it must be so!).
> >>
> >> Here is the issue: If row X is in region A on server 1 there is
> >> nothing to prevent row X+1 from being on region B on server 2.  How do