Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill, mail # dev - Query regarding Scalar Functions implementation

Copy link to this message
Re: Query regarding Scalar Functions implementation
Jacques Nadeau 2013-10-02, 15:46
I haven't run the test yet on MSSQL but reading this suggests that it
int/int == int as opposed to oracle int/int == float4


We should probably pick one and stick to it.  I personally prefer MS but
Oracle is more prevalent.  If I remember correctly, Optiq is modeled more
after one of the two and we should probably continue that trend.  Maybe
Julian can comment here...

On Wed, Oct 2, 2013 at 1:42 AM, Harri Kinnunen

> Hi,
> I'm not sure if we're talking about implementation specifics or what would
> be visible to end user.
> But if this is about end user experience, I'd say the functionality should
> reflect the one in Oracle:
> Returns
> 2.5
> Even:
> returns:
> 3.33333333333333
> Doing:
> Reports the datatype as "NUMBER(precision, scale)".
> So ... I guess we don't have the luxury of "NUMBER(p,s). But somehow we
> should not force users to do explicit casts to achieve the "obvious
> results" (as defined above :) ).
> Cheers,
> Harri
> -----Original Message-----
> From: Jason Altekruse [mailto:[EMAIL PROTECTED]]
> Sent: 2. lokakuuta 2013 1:57
> To: drill-dev
> Subject: Re: Query regarding Scalar Functions implementation
> Hello All,
> I would assume we would want to follow the conventions of most programming
> languages. If users are interested in a decimal result, they would have to
> explicitly cast one of the arguments to a float or float8.
> In regards to mismatched types, there are two ways I can think if doing it.
> We could define a bunch of overloaded methods for each combination, but it
> seems like we have to define each twice for different arrangements of the
> types, such as with  mult(float, float8) and mult(float*, float).
> I think the way we will want to do it is add additional logic to the code
> generation portion of the query, rather than define a bunch of different
> functions.
> For example, as new batches arrive at an operator, if they have a new
> schema we generate code to process the particular types of value vectors
> involved in the operation. I think at this step we should be able to add a
> cast to one of the parameters to direct to a function that defines an
> operation between two operands of the same type.
> Example:
> incoming types int, float
> - cast first parameter to a float
> Deciding which one to cast seems to be pretty standard, as seen here in
> the sql server documentation. They just define a strict hierarchy of types.
> http://technet.microsoft.com/en-us/library/ms190309.aspx
> The only problem I could see with this approach is that the Drill Funcs
> take the value holders as parameters, so we will have to define casting
> rules between the various types. Not sure what this will do for code
> inlining. A major goal of the templates and code generation was allowing
> UDFs while keeping the whole system fast.
> It would also be possible to define additional methods on the various
> value vectors to allow extraction of values directly into different types,
> such as a double extraction method on the float vectors. This might aid
> inlining, as we handle a bit more of the logic while dealing with
> primitives (rather than pulling out a value, sticking it in a holder object
> and then casting the holder to a different object type).
> -Jason
> On Tue, Oct 1, 2013 at 1:06 AM, Yash Sharma <[EMAIL PROTECTED]
> >wrote:
> > Hi Team,
> > I had two  questions regarding the  implementation of  Scalar Functions.
> >
> > 1. What would be the Output type of Division func (given: Input types
> > are all Integers)
> >
> > Currently I have provided an implementation of the DIVISION func which
> > has input/output params as :
> >         @Param  IntHolder left;
> >         @Param  IntHolder right;
> >          @Output IntHolder out;