Well... To be fair, right now KVs are only copied when some encoding was used, otherwise they are not copied at all.
enforceIndependent sounds like a reasonable approach, or maybe copyIfNeeded, or whatever.
Some filters (those that use filterRow(List)) are not compatible with batching.
Also note that your needs for PrefixTries might be very specific. The other encodings always store - as far as I can see at least - whole row keys and whole column families and qualifiers that we principally could reuse for the lifetime of the row without breaking existing behavior if the KeyValue interface was changed accordingly.
From: Matt Corgan <[EMAIL PROTECTED]>
To: dev <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]>
Sent: Monday, December 10, 2012 11:16 PM
Subject: Re: InternalScanner next(..) methods
> unless we force the filter code to make a copy of any KV it wants to hold
> on (which would make the above method extremely expensive).
I was thinking we force the copy for the few filters that do need it. Then
it's no more expensive than the current situation where all filters are
forcing the copy? Right now it's just happening lower down the stack.
We could add a CellTool.enforceIndependent(cell) method that only makes a
copy if it's not already a KeyValue or other non-transient implementation.
Slightly more robust than doing "instanceof KeyValue" check.
Trying to understand too, how does this filter method work if limit=1 is
passed to the next(results, limit) method)? Or any time the limit is less
than the row width.
On Mon, Dec 10, 2012 at 10:56 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> The offending method here is:
> public void filterRow(List<KeyValue> kvs);
> on Filter.java
> But even if we changed that to accept a stream of KVs, how are we going to
> make sure the filter code does not hold on to the KVs it received?
> As long as we allow custom code in filter we cannot reuse the memory
> backing the KVs, unless we force the filter code to make a copy of any KV
> it wants to hold on (which would make the above method extremely expensive).
> Of our pre-canned filters only DependentColumnFilter implements this
> -- Lars
> From: Stack <[EMAIL PROTECTED]>
> To: HBase Dev List <[EMAIL PROTECTED]>; lars hofhansl <
> [EMAIL PROTECTED]>
> Sent: Monday, December 10, 2012 12:26 PM
> Subject: Re: InternalScanner next(..) methods
> On Sun, Dec 9, 2012 at 12:52 PM, lars hofhansl <[EMAIL PROTECTED]>
> > This method specifically only works when this is a heap of StoreScanners
> > (i.e. on the RegionScanner level), which is very confusing (to me
> > Maybe we should have two separate KeyValueHeap implementation to make it
> > less confusing.
> > The list here comprises KVs for the same row key. These KVs need to be
> > collected together so that Filters can operate on entire rows.
> We should change Filter Interface instead giving it a stream rather than
> "whole row"?
> > I just looked at that code this week. We need to fix this stuff. :)