Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Drill >> mail # dev >> Quick overview of HyperBatch concept.


+
Jacques Nadeau 2013-08-07, 01:53
Copy link to this message
-
Re: Quick overview of HyperBatch concept.
Ah gotcha, it's the same concept in MonetDB and what Hive batch query
engine is using too. Didn't know they call it HyperBatch (unless you
invented it?)

Tim
On Tue, Aug 6, 2013 at 6:53 PM, Jacques Nadeau <[EMAIL PROTECTED]> wrote:

> Someone was asking me about the HyperBatch concept that a recent
> commit introduced.  The idea is pretty simple.  We currently have a
> two byte selection vector that we can use to mask a portion of a
> columnar record batch before we rewrite it.  This is to help in
> situations where the rewrite would be unwarranted given the subsequent
> operator.  This works great for non-blocking operators.
>
> In the case of blocking operators such as sort, this becomes a bit
> harder.  (Especially in the case of schema changes, which I won't
> discuss here.)  One solution is generating a this new thing called a
> hyperbatch.  It looks kind of like a batch but it carries a
> SelectionVector4 with it.  The SV4 describes not only the valid
> records, but also their location within a set of multiple support
> record batches.  This is encoded as two unsigned bytes for the record
> batch index followed by two unsigned bytes for the individual record
> (4B records max).  In these cases, a (hyper)batch doesn't hold a
> ValueVector for each field but rather an indexed array of
> ValueVectors.  This allows a pointer sort to completed without
> rewriting the columnar oriented data until required (typically when
> writing to disk or socket).  In the meantime, some additional
> operators can be pipelined with only small modifications.  If we get
> to the point that a particular operator no longer supports a SV4 input
> batch, we insert a SelectionVectorRemover to rewrite the data to the
> more standard record batch format.
>
> You can see an example of the interaction at line 68 of this file:
>
> https://github.com/apache/incubator-drill/blob/db3afaa854fc8475592907dba97162ecf869f9df/sandbox/prototype/exec/java-exec/src/main/java/org/apache/drill/exec/expr/CodeGenerator.java
>
>
> thanks,
> Jacques
>
+
Jacques Nadeau 2013-08-07, 02:47