Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Behavior of Filter.transform() in FilterList?


Copy link to this message
-
Re: Behavior of Filter.transform() in FilterList?
lars hofhansl 2013-07-01, 19:01
It would make sense, but it is not immediately clear how to do so cleanly. We would no longer be able to call transform at the StoreScanner level (or evaluate the filter multiple times, or require the filters to maintain their - last - state and only apply transform selectively).

I added transform() a while ago in order to allow a Filter *not* to transform. Before each we defensively made a copy of the key, just in case a Filter (such as KeyOnlyFilter) would modify it, now this is a formalized, and the filter is responsible for making a copy only when needed.
-- Lars

________________________________
 From: Christophe Taton <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
Sent: Monday, July 1, 2013 10:27 AM
Subject: Re: Behavior of Filter.transform() in FilterList?
 
On Mon, Jul 1, 2013 at 4:14 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:

You want transform to only be called on filters that are "reached"?
>I.e. FilterA and FilterB, FilterB.transform should not be called if a KV is already filtered by FilterA?
>

Yes, that's what I naively expected, at first.

That's not how it works right now, transform is called in a completely different code path from the actual filtering logic.
>

Indeed, I just learned that.
I found no documentation of this behavior, did I miss it?
In particular, the javadoc of the workflow of Filter doesn't mention transform() at all.
Would it make sense to apply transform() only if the return code for filterKeyValue() includes the KeyValue?

C.

-- Lars
>
>
>----- Original Message -----
>From: Christophe Taton <[EMAIL PROTECTED]>
>To: [EMAIL PROTECTED]
>Cc:
>Sent: Sunday, June 30, 2013 10:26 PM
>Subject: Re: Behavior of Filter.transform() in FilterList?
>
>On Sun, Jun 30, 2013 at 10:15 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
>> The clause 'family=X and column=Y and KeyOnlyFilter' would be represented
>> by a FilterList, right ?
>> (family=A and colymn=B) would be represented by another FilterList.
>>
>
>Yes, that would be FilterList(OR, [FilterList(AND, [family=X, column=Y,
>KeyOnlyFilter]), FilterList(AND, [family=A, column=B])]).
>
>So the behavior is expected.
>>
>
>Could you explain, I'm not sure how you reach this conclusion.
>Are you saying it is expected, given the actual implementation
>FilterList.transform()?
>Or are there some other details I missed?
>
>Thanks!
>C.
>
>On Mon, Jul 1, 2013 at 1:10 PM, Christophe Taton <[EMAIL PROTECTED]> wrote:
>>
>> > Hi,
>> >
>> > From
>> >
>> >
>> https://github.com/apache/hbase/blob/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java#L183
>> > ,
>> > it appears that Filter.transform() is invoked unconditionally on all
>> > filters in a FilterList hierarchy.
>> >
>> > This is quite confusing, especially since I may construct a filter like:
>> >     (family=X and column=Y and KeyOnlyFilter) or (family=A and colymn=B)
>> > The KeyOnlyFilter will remove all values from the KeyValues in A:B as
>> well.
>> >
>> > Is my understanding correct? Is this an expected/intended behavior?
>> >
>> > Thanks,
>> > C.
>> >
>>
>
>