Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill, mail # dev - buffer allocation of cast into var length type


Copy link to this message
-
Re: buffer allocation of cast into var length type
Jason Altekruse 2013-12-04, 02:29
Hi Jinfeng,

This might be a dumb question, but is there any transformation being
performed when going from a fixed length type to a variable length type?
That is, are the bytes in the buffer coming in going to be the same as the
bytes coming out of the cast?

I understand that for casts like int-> long we need to add extra space
between each value, but is it possible that we could just hand the buffer
from one value vector type to the other without copying it into a new
buffer?

We would still have to create a new buffer with the offsets of the
"variable length" values, but it would save us some time if we could do
this.

-Jason Altekruse
On Tue, Dec 3, 2013 at 5:35 PM, Jinfeng Ni <[EMAIL PROTECTED]> wrote:

> Hi all,
>
> I' working on the explicit cast support in drill. So far, I have prototyped
> the implementation for the first 3 categories, and would like to seek input
> from you regarding how to deal with the buffer allocation for cast from
> fixed-length type into var-length type.
>
> 1. cast from fixed-length type to fixed-length type
> eg:   float4 --> int,
>         int -> float4,
>
> 2. cast from var-length type to fixed-length type
> eg: varchar --> int
>       varbinary --> int
> (Still need to figure out how to handle overflow issue when cast)
>
> 3. cast from fixed-length type to var-length type
> eg:  int  -> varchar
>        bigint -> varbinary
>
> 4. cast from var-length type to var-length type
> eg:   varchar --> varchar
>         varbinary --> varchar
>
> For the 3rd one, ie. from fixed-length to var-length type, it causes some
> problem to the current implementation, in terms of buffer allocation.
>
> For the fixed-length type, drill uses java primitive type in ValueHolder.
> For instance, IntHolder.value is a int.  But for var-length type, drill
> will use a buffer to keep its value. When doing cast from int into varchar,
> the buffer for the VarCharHolder is not allocated, and we have to figure
> out a way to do the allocation, before cast.
>
> There seems 2 options:
> Option 1:  allocate buffer in the function template setup() method.  The
> buffer will be used in eval() method.
> Problem with this option :
> 1) need copy twice.  first copy from fixed-type input into the buffer
> allocated in setup(), second copy from the buffer into the buffer in the
> target vector.
> 2) need add a cleanup() method to function template, to clean the buffer
> allocated, which currently is not there in the code base.
>
> Option 2:  the consumer of output of the cast function will be responsible
> to pre-allocate buffer in the target ValueVector for all the
> VarCharHolder().  The cast function will simply do the conversion and copy
> into the pre-allocated buffer in the target ValueVector.
> Good thing of this option is it requires 1 copy.
>
> I have prototyped the 1st option, and have not figured out how to implement
> the 2nd approach yet. But I would like to seek suggestion regarding those 2
> options, before I proceed next.
>
> Thanks!
>