|
|
-
ResultCode.NEXT_ROW and scans with batching enabled
David Koch 2013-01-23, 00:13
Hello,
In a scan, when a filter's filterKeyValue method returns ReturnCode.NEXT_ROW - does it actually skip to the next row or just the next batch, provided of course batching is enabled? Where in the HBase source code can I find out about this?
I spent some time looking at HRegion.java to get to grips with how filterRow works (or not) when batching is enabled. In HBase 0.92 hasFilterRow has not been overridden for certain filters which effectively do filter out rows (SingleColumnValueFilter for example). Thus, these filters do not generate a warning when used with a batched scan which - while risky - provides the needed filtering in some cases. This has been fixed for subsequent versions (at least 0.96) so I need to re-implement custom filters which use this "effect".
Thanks,
/David
-
Re: ResultCode.NEXT_ROW and scans with batching enabled
Ted Yu 2013-01-23, 00:48
Take a look at StoreScanner#next():
ScanQueryMatcher.MatchCode qcode = matcher.match(kv);
...
case SEEK_NEXT_ROW:
// This is just a relatively simple end of scan fix, to short-cut end
// us if there is an endKey in the scan.
if (!matcher.moreRowsMayExistAfter(kv)) {
return false;
}
reseek(matcher.getKeyForNextRow(kv));
break; Cheers
On Tue, Jan 22, 2013 at 4:13 PM, David Koch <[EMAIL PROTECTED]> wrote:
> Hello, > > In a scan, when a filter's filterKeyValue method returns > ReturnCode.NEXT_ROW - does it actually skip to the next row or just the > next batch, provided of course batching is enabled? Where in the HBase > source code can I find out about this? > > I spent some time looking at HRegion.java to get to grips with how > filterRow works (or not) when batching is enabled. In HBase 0.92 > hasFilterRow has not been overridden for certain filters which effectively > do filter out rows (SingleColumnValueFilter for example). Thus, these > filters do not generate a warning when used with a batched scan which - > while risky - provides the needed filtering in some cases. This has been > fixed for subsequent versions (at least 0.96) so I need to re-implement > custom filters which use this "effect". > > Thanks, > > /David >
-
RE: ResultCode.NEXT_ROW and scans with batching enabled
Anoop Sam John 2013-01-23, 03:44
Hi,
>In a scan, when a filter's filterKeyValue method returns >ReturnCode.NEXT_ROW - does it actually skip to the next row or just the >next batch
It will go to the new row.
>In HBase 0.92 > hasFilterRow has not been overridden for certain filters which effectively > do filter out rows (SingleColumnValueFilter for example).
Yes this is an issue in old versions. It is fixed in trunk now.
> I spent some time looking at HRegion.java to get to grips with how > filterRow works (or not) when batching is enabled.
See the method RegionScannerImpl#nextInternal(int limit) [In HRegion.java]. You can see a do while loop. This loop takes all the KVs for a row (and thus can be grouped as one Result). This one only checks for the batch size (limit) When the filter says to go to next row, there will be a seek to the next row [As Ted said see the code in StoreScanner]. This will make the peekRow() return the next row key which is not same as the currentRow.. [Pls see the code].. So this batch will end there and next batch will be KVs from next row only.
-Anoop- ________________________________________ From: Ted Yu [[EMAIL PROTECTED]] Sent: Wednesday, January 23, 2013 6:18 AM To: [EMAIL PROTECTED] Subject: Re: ResultCode.NEXT_ROW and scans with batching enabled
Take a look at StoreScanner#next():
ScanQueryMatcher.MatchCode qcode = matcher.match(kv);
...
case SEEK_NEXT_ROW:
// This is just a relatively simple end of scan fix, to short-cut end
// us if there is an endKey in the scan.
if (!matcher.moreRowsMayExistAfter(kv)) {
return false;
}
reseek(matcher.getKeyForNextRow(kv));
break; Cheers
On Tue, Jan 22, 2013 at 4:13 PM, David Koch <[EMAIL PROTECTED]> wrote:
> Hello, > > In a scan, when a filter's filterKeyValue method returns > ReturnCode.NEXT_ROW - does it actually skip to the next row or just the > next batch, provided of course batching is enabled? Where in the HBase > source code can I find out about this? > > I spent some time looking at HRegion.java to get to grips with how > filterRow works (or not) when batching is enabled. In HBase 0.92 > hasFilterRow has not been overridden for certain filters which effectively > do filter out rows (SingleColumnValueFilter for example). Thus, these > filters do not generate a warning when used with a batched scan which - > while risky - provides the needed filtering in some cases. This has been > fixed for subsequent versions (at least 0.96) so I need to re-implement > custom filters which use this "effect". > > Thanks, > > /David >
-
Re: ResultCode.NEXT_ROW and scans with batching enabled
David Koch 2013-01-23, 23:32
Hi guys,
Thank you for the explanations.
/David
On Wed, Jan 23, 2013 at 4:44 AM, Anoop Sam John <[EMAIL PROTECTED]> wrote:
> Hi, > > >In a scan, when a filter's filterKeyValue method returns > >ReturnCode.NEXT_ROW - does it actually skip to the next row or just the > >next batch > > It will go to the new row. > > >In HBase 0.92 > > hasFilterRow has not been overridden for certain filters which > effectively > > do filter out rows (SingleColumnValueFilter for example). > > Yes this is an issue in old versions. It is fixed in trunk now. > > > I spent some time looking at HRegion.java to get to grips with how > > filterRow works (or not) when batching is enabled. > > See the method RegionScannerImpl#nextInternal(int limit) [In > HRegion.java]. You can see a do while loop. This loop takes all the KVs for > a row (and thus can be grouped as one Result). This one only checks for the > batch size (limit) When the filter says to go to next row, there will be a > seek to the next row [As Ted said see the code in StoreScanner]. This will > make the peekRow() return the next row key which is not same as the > currentRow.. [Pls see the code].. So this batch will end there and next > batch will be KVs from next row only. > > -Anoop- > ________________________________________ > From: Ted Yu [[EMAIL PROTECTED]] > Sent: Wednesday, January 23, 2013 6:18 AM > To: [EMAIL PROTECTED] > Subject: Re: ResultCode.NEXT_ROW and scans with batching enabled > > Take a look at StoreScanner#next(): > > ScanQueryMatcher.MatchCode qcode = matcher.match(kv); > > ... > > case SEEK_NEXT_ROW: > > // This is just a relatively simple end of scan fix, to > short-cut end > > // us if there is an endKey in the scan. > > if (!matcher.moreRowsMayExistAfter(kv)) { > > return false; > > } > > reseek(matcher.getKeyForNextRow(kv)); > > break; > Cheers > > On Tue, Jan 22, 2013 at 4:13 PM, David Koch <[EMAIL PROTECTED]> wrote: > > > Hello, > > > > In a scan, when a filter's filterKeyValue method returns > > ReturnCode.NEXT_ROW - does it actually skip to the next row or just the > > next batch, provided of course batching is enabled? Where in the HBase > > source code can I find out about this? > > > > I spent some time looking at HRegion.java to get to grips with how > > filterRow works (or not) when batching is enabled. In HBase 0.92 > > hasFilterRow has not been overridden for certain filters which > effectively > > do filter out rows (SingleColumnValueFilter for example). Thus, these > > filters do not generate a warning when used with a batched scan which - > > while risky - provides the needed filtering in some cases. This has been > > fixed for subsequent versions (at least 0.96) so I need to re-implement > > custom filters which use this "effect". > > > > Thanks, > > > > /David > > >
|
|