Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Regarding order of filters in FilterList


Copy link to this message
-
Regarding order of filters in FilterList
Hi All
               One thing came while going through Filter code

Suppose I am using a FilterList along with my Scan. The list contains one PageFilter(max pages=N) and one SingleColumnValueFilter.[One filter checks a col value and other deals with number of rows in result] So as a user what I expect out of this usage is to get N number of rows where colval=X
Now if I create my FilterList like below things would work fine

FilterList list = new FilterList();
SingleColumnValueFilter f = new SingleColumnValueFilter(..)
f.setLatestVersionOnly(false);
list.add( f);
list.add( new PageFilter(..));

Just use the code with slight diff in the order in which the filters are added
FilterList list = new FilterList();
list.add( new PageFilter(..));
SingleColumnValueFilter f = new SingleColumnValueFilter(..)
f.setLatestVersionOnly(false);
list.add( f);
Being a user I would expect to get the same result. But it may not be. This even can return me empty results also.
Here as the filter which deals with the number of returned rows coming 1st. So even if the second filter might filter out one KV or row later, this number tracked within the filter getting incremented. [1st filter never gets a chance to rollback the operation when the next filter filters out the row/KV]

If we use like the 1st way in which the filters dealing with the number of rows or KVs as the last items in the FilterList, there wont be any problem. Do some one feel this as an issue with our filter framework and FilterList ?  Atleast we should document this clearly I think. Pls give your suggestion.

Note : In the above shown code sample if we are going with latet version only=true for SingleColumnValueFilter , even the second scenario also will be working fine. Here the diff would be skipping one row will be handled by the filterRowKey() itself and filterRow() wont get called for the ignored rowkey.[PageFilter deals with filterRow()]
So it all depends on which of the APIs the filter is using to decide to include or skip or seek to rows.
Also there is  one issues opened regarding the FilterList  HBASE-6132    Pls give your valuable thoughts and suggestions
-Anoop-