Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # dev >> Implementation of the upper() function


Copy link to this message
-
Re: Implementation of the upper() function
Hi Tanujit,

Responses in-line:

On Sat, Aug 24, 2013 at 4:35 AM, Tanujit Ghosh <[EMAIL PROTECTED]>wrote:

> Hi All,
>
> I put together a simple implementation of the upper() string function.
>
> Few questions
>
> 1. Is this the correct way, just wanted to understand to understand the
> ByteBuf and whether we should be doing the setBytes with the string.
> Another obvious optimization that can consider the byte as number and then
> intrepret to ascii value and if it falls int he lower change change the
> value to higher value.
> Just wanted to know what is the way that would be better.
>

The supplied code appears correct.  Note that the current default encoding
for Drill is UTF-8, while the Java String is UTF-16.  Ideally, the upper()
function would operate directly on the UTF-8 encoded ByteBufs and avoid
allocating a temporary string.  But this seems just fine for now.
> 2. It would be helpful to get an explainatin on the bytebuf and how its
> wrapped around in the vector and holder objects and how this is supposed to
> be shared.
>

A *ByteBuf* is a pooled, reference-counted resource (we actually use the *
PooledByteBuf*).
A *ValueVector* reference a ByteBuf, which stores a vector of values for a
single field.
A *VectorHolder* can be thought of as a reference to a single value in the
ValueVector.
When a ValueVector is 'done' with the ByteBuf, the buffer's reference count
is decremented and can be reclaimed in the pool.  See
BaseDataValueVector.clear().
>
>
>   @FunctionTemplate(name = "uppercase", scope > FunctionTemplate.FunctionScope.SIMPLE,
>               nulls = FunctionTemplate.NullHandling.NULL_IF_NULL)
>   public static class UpperString implements DrillFunc {
>
>     @Param  VarCharHolder input;
>     @Output VarCharHolder out;
>
>     public void setup(RecordBatch b){}
>
>     public void eval(){
>       out.buffer = input.buffer;
>       out.start = input.start;
>       out.end = input.end;
>
>       String inputStr = input.toString().toUpperCase();
>       out.buffer.setBytes(input.start, inputStr.getBytes());
>     }
>
>     public static class Provider implements CallProvider {
>       @Override
>       public FunctionDefinition[] getFunctionDefintions() {
>         return new FunctionDefinition[] {
>           FunctionDefinition.simple("uppercase",
>             new BasicArgumentValidator(new
> Arg(Types.required(TypeProtos.MinorType.VARCHAR),
>
>  Types.required(TypeProtos.MinorType.VAR16CHAR))),
>

I'm not sure you need to worry about VAR16CHAR for now, but you may want to
allow Types.optional(VARCHAR).  The NULL_IF_NULL declaration will handle
null values automatically.
>             new OutputTypeDeterminer.SameAsFirstInput(),
>             "upper")
>         };
>       }
>     }
>   }
>
>
> Thanks and Regards,
> Tanujit
>
Regards,
-Ben
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB