Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # dev - InternalScanner next(..) methods


+
Matt Corgan 2012-12-09, 07:27
+
lars hofhansl 2012-12-09, 20:52
+
Matt Corgan 2012-12-10, 01:03
+
Stack 2012-12-10, 20:26
+
Dave Latham 2012-12-11, 18:54
+
lars hofhansl 2012-12-11, 06:56
+
Matt Corgan 2012-12-11, 07:16
+
lars hofhansl 2012-12-11, 07:28
Copy link to this message
-
Re: InternalScanner next(..) methods
Matt Corgan 2012-12-11, 08:08
>
> The other encodings always store whole row keys and whole column families
> and qualifiers..

I think you are right but are forgetting about the value part of the cell.
 The other encoders are somewhere between the prefix-tree and the original
KeyValue format.  They concatenate row/family/qualifier/timestamp/type
together into a single array, but value lives separately in a disconnected
part of the data block.  So they also require the Cell interface to get
through the read path unless you want to copy the value over into the
"buffer" before spawning the KeyValue wrapper.  You can see the current
code copying key+value together in
BufferedEncodedSeeker.getKeyValueBuffer().
On Mon, Dec 10, 2012 at 11:28 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Well... To be fair, right now KVs are only copied when some encoding was
> used, otherwise they are not copied at all.
> enforceIndependent sounds like a reasonable approach, or maybe
> copyIfNeeded, or whatever.
>
>
> Some filters (those that use filterRow(List)) are not compatible with
> batching.
>
>
> Also note that your needs for PrefixTries might be very specific. The
> other encodings always store - as far as I can see at least - whole row
> keys and whole column families and qualifiers that we principally could
> reuse for the lifetime of the row without breaking existing behavior if the
> KeyValue interface was changed accordingly.
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Matt Corgan <[EMAIL PROTECTED]>
> To: dev <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]>
> Sent: Monday, December 10, 2012 11:16 PM
> Subject: Re: InternalScanner next(..) methods
>
> >
> > unless we force the filter code to make a copy of any KV it wants to hold
> > on (which would make the above method extremely expensive).
>
> I was thinking we force the copy for the few filters that do need it.  Then
> it's no more expensive than the current situation where all filters are
> forcing the copy?  Right now it's just happening lower down the stack.
>
> We could add a CellTool.enforceIndependent(cell) method that only makes a
> copy if it's not already a KeyValue or other non-transient implementation.
> Slightly more robust than doing "instanceof KeyValue" check.
>
> Trying to understand too, how does this filter method work if limit=1 is
> passed to the next(results, limit) method)?  Or any time the limit is less
> than the row width.
>
> On Mon, Dec 10, 2012 at 10:56 PM, lars hofhansl <[EMAIL PROTECTED]>
> wrote:
>
> > The offending method here is:
> >
> >   public void filterRow(List<KeyValue> kvs);
> > on Filter.java
> >
> > But even if we changed that to accept a stream of KVs, how are we going
> to
> > make sure the filter code does not hold on to the KVs it received?
> > As long as we allow custom code in filter we cannot reuse the memory
> > backing the KVs, unless we force the filter code to make a copy of any KV
> > it wants to hold on (which would make the above method extremely
> expensive).
> >
> >
> > Of our pre-canned filters only DependentColumnFilter implements this
> > method.
> >
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> >  From: Stack <[EMAIL PROTECTED]>
> > To: HBase Dev List <[EMAIL PROTECTED]>; lars hofhansl <
> > [EMAIL PROTECTED]>
> > Sent: Monday, December 10, 2012 12:26 PM
> > Subject: Re: InternalScanner next(..) methods
> >
> > On Sun, Dec 9, 2012 at 12:52 PM, lars hofhansl <[EMAIL PROTECTED]>
> > wrote:
> >
> > > This method specifically only works when this is a heap of
> StoreScanners
> > > (i.e. on the RegionScanner level), which is very confusing (to me
> > anyway).
> > > Maybe we should have two separate KeyValueHeap implementation to make
> it
> > > less confusing.
> > >
> > > The list here comprises KVs for the same row key. These KVs need to be
> > > collected together so that Filters can operate on entire rows.
> > >
> > >
> > We should change Filter Interface instead giving it a stream rather than
+
Stack 2012-12-10, 20:24