Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> ResultCode.NEXT_ROW and scans with batching enabled


+
David Koch 2013-01-23, 00:13
+
Ted Yu 2013-01-23, 00:48
Copy link to this message
-
RE: ResultCode.NEXT_ROW and scans with batching enabled
Hi,

>In a scan, when a filter's filterKeyValue method returns
>ReturnCode.NEXT_ROW - does it actually skip to the next row or just the
>next batch

It will go to the new row.

>In HBase 0.92
> hasFilterRow has not been overridden for certain filters which effectively
> do filter out rows (SingleColumnValueFilter for example).

Yes this is an issue in old versions. It is fixed in trunk now.

> I spent some time looking at HRegion.java to get to grips with how
> filterRow works (or not) when batching is enabled.

See the method RegionScannerImpl#nextInternal(int limit)  [In HRegion.java]. You can see a do while loop. This loop takes all the KVs for a row (and thus can be grouped as one Result). This one only checks for the batch size (limit)  When the filter says to go to next row, there will be a seek to the next row [As Ted said see the code in StoreScanner]. This will make the peekRow() return the next row key which is not same as the currentRow.. [Pls see the code]..  So this batch will end there and next batch will be KVs from next row only.

-Anoop-
________________________________________
From: Ted Yu [[EMAIL PROTECTED]]
Sent: Wednesday, January 23, 2013 6:18 AM
To: [EMAIL PROTECTED]
Subject: Re: ResultCode.NEXT_ROW and scans with batching enabled

Take a look at StoreScanner#next():

        ScanQueryMatcher.MatchCode qcode = matcher.match(kv);

...

          case SEEK_NEXT_ROW:

            // This is just a relatively simple end of scan fix, to
short-cut end

            // us if there is an endKey in the scan.

            if (!matcher.moreRowsMayExistAfter(kv)) {

              return false;

            }

            reseek(matcher.getKeyForNextRow(kv));

            break;
Cheers

On Tue, Jan 22, 2013 at 4:13 PM, David Koch <[EMAIL PROTECTED]> wrote:

> Hello,
>
> In a scan, when a filter's filterKeyValue method returns
> ReturnCode.NEXT_ROW - does it actually skip to the next row or just the
> next batch, provided of course batching is enabled? Where in the HBase
> source code can I find out about this?
>
> I spent some time looking at HRegion.java to get to grips with how
> filterRow works (or not) when batching is enabled. In HBase 0.92
> hasFilterRow has not been overridden for certain filters which effectively
> do filter out rows (SingleColumnValueFilter for example). Thus, these
> filters do not generate a warning when used with a batched scan which -
> while risky - provides the needed filtering in some cases. This has been
> fixed for subsequent versions (at least 0.96) so I need to re-implement
> custom filters which use this "effect".
>
> Thanks,
>
> /David
>
+
David Koch 2013-01-23, 23:32