Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # dev >> Query regarding Scalar Functions implementation

Copy link to this message
RE: Query regarding Scalar Functions implementation

I'm not sure if we're talking about implementation specifics or what would be visible to end user.

But if this is about end user experience, I'd say the functionality should reflect the one in Oracle:


Reports the datatype as "NUMBER(precision, scale)".

So ... I guess we don't have the luxury of "NUMBER(p,s). But somehow we should not force users to do explicit casts to achieve the "obvious results" (as defined above :) ).

-----Original Message-----
From: Jason Altekruse [mailto:[EMAIL PROTECTED]]
Sent: 2. lokakuuta 2013 1:57
To: drill-dev
Subject: Re: Query regarding Scalar Functions implementation

Hello All,

I would assume we would want to follow the conventions of most programming languages. If users are interested in a decimal result, they would have to explicitly cast one of the arguments to a float or float8.

In regards to mismatched types, there are two ways I can think if doing it.
We could define a bunch of overloaded methods for each combination, but it seems like we have to define each twice for different arrangements of the types, such as with  mult(float, float8) and mult(float*, float).

I think the way we will want to do it is add additional logic to the code generation portion of the query, rather than define a bunch of different functions.

For example, as new batches arrive at an operator, if they have a new schema we generate code to process the particular types of value vectors involved in the operation. I think at this step we should be able to add a cast to one of the parameters to direct to a function that defines an operation between two operands of the same type.

incoming types int, float
- cast first parameter to a float

Deciding which one to cast seems to be pretty standard, as seen here in the sql server documentation. They just define a strict hierarchy of types.


The only problem I could see with this approach is that the Drill Funcs take the value holders as parameters, so we will have to define casting rules between the various types. Not sure what this will do for code inlining. A major goal of the templates and code generation was allowing UDFs while keeping the whole system fast.

It would also be possible to define additional methods on the various value vectors to allow extraction of values directly into different types, such as a double extraction method on the float vectors. This might aid inlining, as we handle a bit more of the logic while dealing with primitives (rather than pulling out a value, sticking it in a holder object and then casting the holder to a different object type).

On Tue, Oct 1, 2013 at 1:06 AM, Yash Sharma <[EMAIL PROTECTED]>wrote:

> Hi Team,
> I had two  questions regarding the  implementation of  Scalar Functions.
> 1. What would be the Output type of Division func (given: Input types
> are all Integers)
> Currently I have provided an implementation of the DIVISION func which
> has input/output params as :
>         @Param  IntHolder left;
>         @Param  IntHolder right;
>          @Output IntHolder out;
> now, the issue is the data type of output field:
> output type will be integer if left & right are divisible integers, while..
> output type would be decimal if left & right are non-divisible
> integers (i.e.  have a remainder)
> So my question is,
> Do I have to provide 3 overloaded methods for division with different
> @output types, (IntHolder, Float4Holder, Float8Holder) ?
> or shall I have a  Float8 output type irrespective of the inputs?
> Other functions like add/multiple & subtract won't be having this issue..
> . It's only the issue with division.
> 2. What would be the input type for any Scalar func (given: Input
> types might not always be Integers).