Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill, mail # dev - Implementation of the upper() function

Copy link to this message
Re: Implementation of the upper() function
Ben Becker 2013-08-26, 20:02
Hi Tanujit,

Responses in-line:

On Sat, Aug 24, 2013 at 4:35 AM, Tanujit Ghosh <[EMAIL PROTECTED]>wrote:

> Hi All,
> I put together a simple implementation of the upper() string function.
> Few questions
> 1. Is this the correct way, just wanted to understand to understand the
> ByteBuf and whether we should be doing the setBytes with the string.
> Another obvious optimization that can consider the byte as number and then
> intrepret to ascii value and if it falls int he lower change change the
> value to higher value.
> Just wanted to know what is the way that would be better.

The supplied code appears correct.  Note that the current default encoding
for Drill is UTF-8, while the Java String is UTF-16.  Ideally, the upper()
function would operate directly on the UTF-8 encoded ByteBufs and avoid
allocating a temporary string.  But this seems just fine for now.
> 2. It would be helpful to get an explainatin on the bytebuf and how its
> wrapped around in the vector and holder objects and how this is supposed to
> be shared.

A *ByteBuf* is a pooled, reference-counted resource (we actually use the *
A *ValueVector* reference a ByteBuf, which stores a vector of values for a
single field.
A *VectorHolder* can be thought of as a reference to a single value in the
When a ValueVector is 'done' with the ByteBuf, the buffer's reference count
is decremented and can be reclaimed in the pool.  See
>   @FunctionTemplate(name = "uppercase", scope > FunctionTemplate.FunctionScope.SIMPLE,
>               nulls = FunctionTemplate.NullHandling.NULL_IF_NULL)
>   public static class UpperString implements DrillFunc {
>     @Param  VarCharHolder input;
>     @Output VarCharHolder out;
>     public void setup(RecordBatch b){}
>     public void eval(){
>       out.buffer = input.buffer;
>       out.start = input.start;
>       out.end = input.end;
>       String inputStr = input.toString().toUpperCase();
>       out.buffer.setBytes(input.start, inputStr.getBytes());
>     }
>     public static class Provider implements CallProvider {
>       @Override
>       public FunctionDefinition[] getFunctionDefintions() {
>         return new FunctionDefinition[] {
>           FunctionDefinition.simple("uppercase",
>             new BasicArgumentValidator(new
> Arg(Types.required(TypeProtos.MinorType.VARCHAR),
>  Types.required(TypeProtos.MinorType.VAR16CHAR))),

I'm not sure you need to worry about VAR16CHAR for now, but you may want to
allow Types.optional(VARCHAR).  The NULL_IF_NULL declaration will handle
null values automatically.
>             new OutputTypeDeterminer.SameAsFirstInput(),
>             "upper")
>         };
>       }
>     }
>   }
> Thanks and Regards,
> Tanujit