Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> InternalScanner next(..) methods


Copy link to this message
-
Re: InternalScanner next(..) methods
Seems like the two uses of the KeyValueHeap hive diverged enough that they
deserve separate implementations, like you say Lars.  RegionHeap needs the
row buffering, but StoreHeap apparently does not.  Does the RegionHeap need
lazy-seek methods?

If we separate them we might be able to get Cells through the StoreHeap
which will buy us some big speed increases, especially on compactions where
we don't have to use the RegionHeap.  During a compaction, if we can pass
each Cell immediately from the heap to the CellOutputStream, then we never
need to inflate it into a KeyValue.  Looking at the usage of
"List<KeyValue> kvs" at the bottom of Compactor.java, i don't see anything
preventing this.  Looks like the List<KeyValue> is just used to move 10
cells at a time instead of 1, which i don't think buys much.
On Sun, Dec 9, 2012 at 12:52 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> This method specifically only works when this is a heap of StoreScanners
> (i.e. on the RegionScanner level), which is very confusing (to me anyway).
> Maybe we should have two separate KeyValueHeap implementation to make it
> less confusing.
>
> The list here comprises KVs for the same row key. These KVs need to be
> collected together so that Filters can operate on entire rows.
>
> I just looked at that code this week. We need to fix this stuff. :)
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Matt Corgan <[EMAIL PROTECTED]>
> To: dev <[EMAIL PROTECTED]>
> Sent: Saturday, December 8, 2012 11:27 PM
> Subject: InternalScanner next(..) methods
>
> I'm looking at the KeyValueHeap trying to see how we can make it work with
> Cells.  I'm curious, in this method
>
>   @Override
>   public boolean next(List<KeyValue> result, int limit, String metric)
> throws IOException {
>     if (this.current == null) {
>       return false;
>     }
>     InternalScanner currentAsInternal = (InternalScanner)this.current;
>     boolean mayContainMoreRows = currentAsInternal.next(result, limit,
> metric);
>
> how is it getting multiple results from a single scanner without putting
> the scanner back on the heap?  Couldn't that skip KeyValues?  Is it that
> it's only used at the Region level where the family-per-file semantics
> guarantee that all KeyValues in a single family will sort together?
>
> My bigger question is regarding the next(List<KeyValue> result, int limit)
> methods from the InternalScanner interface.  What's the reasoning for
> getting multiple results in one call as opposed to calling the next()
> method a bunch of times?  Buffering the KeyValues in a List like that means
> the Cells would have to be expanded into full KeyValues which would be nice
> to avoid.  Is there some logic that depends on getting a whole row of
> values, even though you may only get a partial row due to the limit param?
>
> Similarly, I see there is Filter.filterRow(List<KeyValue>) which looks like
> it's barely used.  Is that an important method?  Doesn't look like it's
> used much, but maybe people have custom Filters that need it.
>
> Thanks,
> Matt
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB