Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - FilterList: possible bug in getNextKeyHint


Copy link to this message
-
Re: FilterList: possible bug in getNextKeyHint
Ted Yu 2013-07-29, 17:49
Looking into FilterList#filterKeyValue() and FilterList#getNextKeyHint(),
they both iterate through all the filters.

Suppose there are 3 or more filters in the FilterList which implement
getNextKeyHint(), how would the state be maintained ?

Cheers

On Sun, Jul 28, 2013 at 9:22 PM, Viral Bajaria <[EMAIL PROTECTED]>wrote:

> Hi,
>
> I hit a weird issue/bug and am able to reproduce the error consistently.
> The problem arises when FilterList has two filters where each implements
> the getNextKeyHint method.
>
> The way the current implementation works is, StoreScanner will call
> matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in
> turn will call filter.getNextKeyHint() which at this stage is of type
> FilterList. The implementation in FilterList iterates through all the
> filters and keeps the max KeyValue that it sees. All is fine if you wrap
> filters in FilterList in which only one of them implements getNextKeyHint.
> but if multiple of them implement then that's where things get weird.
>
> For example:
> - create two filters: one is FuzzyRowFilter and second is
> ColumnRangeFilter. Both of them implement getNextKeyHint
> - wrap them in FilterList with MUST_PASS_ALL
> - FuzzyRowFilter will seek to the correct first row and then pass it to
> ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
> - Now in FilterList when getNextKeyHint is called, it calls the one on
> FuzzyRow first which basically says what the next row should be. While in
> reality we want the ColumnRangeFilter to give the seek hint.
> - The above behavior skips data that should be returned, which I have
> verified by using a RowFilter with RegexStringComparator.
>
> I updated the FilterList to maintain state on which filter returns the
> SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the
> saved filter and reset that state. I tested it with my current queries and
> it works fine but I need to run the entire test suite to make sure I have
> not introduced any regression. In addition to that I need to figure out
> what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt
> it should be any different.
>
> Is my understanding of it being a bug correct ? Or am I trivializing it and
> ignoring something very important ? If it's tough to wrap your head around
> the explanation, then I can open a JIRA and upload a patch against 0.94
> head.
>
> Thanks,
> Viral
>