Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill, mail # dev - Quick overview of HyperBatch concept.

Copy link to this message
Re: Quick overview of HyperBatch concept.
Timothy Chen 2013-08-07, 02:27
Ah gotcha, it's the same concept in MonetDB and what Hive batch query
engine is using too. Didn't know they call it HyperBatch (unless you
invented it?)

On Tue, Aug 6, 2013 at 6:53 PM, Jacques Nadeau <[EMAIL PROTECTED]> wrote:

> Someone was asking me about the HyperBatch concept that a recent
> commit introduced.  The idea is pretty simple.  We currently have a
> two byte selection vector that we can use to mask a portion of a
> columnar record batch before we rewrite it.  This is to help in
> situations where the rewrite would be unwarranted given the subsequent
> operator.  This works great for non-blocking operators.
> In the case of blocking operators such as sort, this becomes a bit
> harder.  (Especially in the case of schema changes, which I won't
> discuss here.)  One solution is generating a this new thing called a
> hyperbatch.  It looks kind of like a batch but it carries a
> SelectionVector4 with it.  The SV4 describes not only the valid
> records, but also their location within a set of multiple support
> record batches.  This is encoded as two unsigned bytes for the record
> batch index followed by two unsigned bytes for the individual record
> (4B records max).  In these cases, a (hyper)batch doesn't hold a
> ValueVector for each field but rather an indexed array of
> ValueVectors.  This allows a pointer sort to completed without
> rewriting the columnar oriented data until required (typically when
> writing to disk or socket).  In the meantime, some additional
> operators can be pipelined with only small modifications.  If we get
> to the point that a particular operator no longer supports a SV4 input
> batch, we insert a SelectionVectorRemover to rewrite the data to the
> more standard record batch format.
> You can see an example of the interaction at line 68 of this file:
> https://github.com/apache/incubator-drill/blob/db3afaa854fc8475592907dba97162ecf869f9df/sandbox/prototype/exec/java-exec/src/main/java/org/apache/drill/exec/expr/CodeGenerator.java
> thanks,
> Jacques