Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - InternalScanner next(..) methods


Copy link to this message
-
Re: InternalScanner next(..) methods
Matt Corgan 2012-12-11, 07:16
>
> unless we force the filter code to make a copy of any KV it wants to hold
> on (which would make the above method extremely expensive).

I was thinking we force the copy for the few filters that do need it.  Then
it's no more expensive than the current situation where all filters are
forcing the copy?  Right now it's just happening lower down the stack.

We could add a CellTool.enforceIndependent(cell) method that only makes a
copy if it's not already a KeyValue or other non-transient implementation.
 Slightly more robust than doing "instanceof KeyValue" check.

Trying to understand too, how does this filter method work if limit=1 is
passed to the next(results, limit) method)?  Or any time the limit is less
than the row width.

On Mon, Dec 10, 2012 at 10:56 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> The offending method here is:
>
>   public void filterRow(List<KeyValue> kvs);
> on Filter.java
>
> But even if we changed that to accept a stream of KVs, how are we going to
> make sure the filter code does not hold on to the KVs it received?
> As long as we allow custom code in filter we cannot reuse the memory
> backing the KVs, unless we force the filter code to make a copy of any KV
> it wants to hold on (which would make the above method extremely expensive).
>
>
> Of our pre-canned filters only DependentColumnFilter implements this
> method.
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Stack <[EMAIL PROTECTED]>
> To: HBase Dev List <[EMAIL PROTECTED]>; lars hofhansl <
> [EMAIL PROTECTED]>
> Sent: Monday, December 10, 2012 12:26 PM
> Subject: Re: InternalScanner next(..) methods
>
> On Sun, Dec 9, 2012 at 12:52 PM, lars hofhansl <[EMAIL PROTECTED]>
> wrote:
>
> > This method specifically only works when this is a heap of StoreScanners
> > (i.e. on the RegionScanner level), which is very confusing (to me
> anyway).
> > Maybe we should have two separate KeyValueHeap implementation to make it
> > less confusing.
> >
> > The list here comprises KVs for the same row key. These KVs need to be
> > collected together so that Filters can operate on entire rows.
> >
> >
> We should change Filter Interface instead giving it a stream rather than
> "whole row"?
>
>
>
> > I just looked at that code this week. We need to fix this stuff. :)
> >
> >
> +1
>
> St.Ack
>