Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - FilterList: possible bug in getNextKeyHint


Copy link to this message
-
FilterList: possible bug in getNextKeyHint
Viral Bajaria 2013-07-29, 04:22
Hi,

I hit a weird issue/bug and am able to reproduce the error consistently.
The problem arises when FilterList has two filters where each implements
the getNextKeyHint method.

The way the current implementation works is, StoreScanner will call
matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in
turn will call filter.getNextKeyHint() which at this stage is of type
FilterList. The implementation in FilterList iterates through all the
filters and keeps the max KeyValue that it sees. All is fine if you wrap
filters in FilterList in which only one of them implements getNextKeyHint.
but if multiple of them implement then that's where things get weird.

For example:
- create two filters: one is FuzzyRowFilter and second is
ColumnRangeFilter. Both of them implement getNextKeyHint
- wrap them in FilterList with MUST_PASS_ALL
- FuzzyRowFilter will seek to the correct first row and then pass it to
ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
- Now in FilterList when getNextKeyHint is called, it calls the one on
FuzzyRow first which basically says what the next row should be. While in
reality we want the ColumnRangeFilter to give the seek hint.
- The above behavior skips data that should be returned, which I have
verified by using a RowFilter with RegexStringComparator.

I updated the FilterList to maintain state on which filter returns the
SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the
saved filter and reset that state. I tested it with my current queries and
it works fine but I need to run the entire test suite to make sure I have
not introduced any regression. In addition to that I need to figure out
what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt
it should be any different.

Is my understanding of it being a bug correct ? Or am I trivializing it and
ignoring something very important ? If it's tough to wrap your head around
the explanation, then I can open a JIRA and upload a patch against 0.94
head.

Thanks,
Viral