Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Drill >> mail # dev >> Query regarding Scalar Functions implementation

Yash Sharma 2013-10-01, 06:06
Jason Altekruse 2013-10-01, 22:57
Jacques Nadeau 2013-10-02, 15:46
Harri Kinnunen 2013-10-02, 08:42
Jacques Nadeau 2013-10-02, 15:46
Julian Hyde 2013-10-02, 16:03
Copy link to this message
RE: Query regarding Scalar Functions implementation
Hari -
I was talking w.r.t. implementation of the functions rather than the output to user. I can pick it up once the implementation is over.
Jason -
Even I am worried for overloaded functions -  they would lead to lot of functions.
Code generation definitely looks like an interesting approach.
Could you point the java class that is handling the code generation of the queries, I can debug it and learn little more about it.

From: Harri Kinnunen [[EMAIL PROTECTED]]
Sent: Wednesday, October 02, 2013 2:12 PM
Subject: RE: Query regarding Scalar Functions implementation


I'm not sure if we're talking about implementation specifics or what would be visible to end user.

But if this is about end user experience, I'd say the functionality should reflect the one in Oracle:


Reports the datatype as "NUMBER(precision, scale)".

So ... I guess we don't have the luxury of "NUMBER(p,s). But somehow we should not force users to do explicit casts to achieve the "obvious results" (as defined above :) ).

-----Original Message-----
From: Jason Altekruse [mailto:[EMAIL PROTECTED]]
Sent: 2. lokakuuta 2013 1:57
To: drill-dev
Subject: Re: Query regarding Scalar Functions implementation

Hello All,

I would assume we would want to follow the conventions of most programming languages. If users are interested in a decimal result, they would have to explicitly cast one of the arguments to a float or float8.

In regards to mismatched types, there are two ways I can think if doing it.
We could define a bunch of overloaded methods for each combination, but it seems like we have to define each twice for different arrangements of the types, such as with  mult(float, float8) and mult(float*, float).

I think the way we will want to do it is add additional logic to the code generation portion of the query, rather than define a bunch of different functions.

For example, as new batches arrive at an operator, if they have a new schema we generate code to process the particular types of value vectors involved in the operation. I think at this step we should be able to add a cast to one of the parameters to direct to a function that defines an operation between two operands of the same type.

incoming types int, float
- cast first parameter to a float

Deciding which one to cast seems to be pretty standard, as seen here in the sql server documentation. They just define a strict hierarchy of types.


The only problem I could see with this approach is that the Drill Funcs take the value holders as parameters, so we will have to define casting rules between the various types. Not sure what this will do for code inlining. A major goal of the templates and code generation was allowing UDFs while keeping the whole system fast.

It would also be possible to define additional methods on the various value vectors to allow extraction of values directly into different types, such as a double extraction method on the float vectors. This might aid inlining, as we handle a bit more of the logic while dealing with primitives (rather than pulling out a value, sticking it in a holder object and then casting the holder to a different object type).

On Tue, Oct 1, 2013 at 1:06 AM, Yash Sharma <[EMAIL PROTECTED]>wrote:

> Hi Team,
> I had two  questions regarding the  implementation of  Scalar Functions.
> 1. What would be the Output type of Division func (given: Input types
> are all Integers)
> Currently I have provided an implementation of the DIVISION func which
> has input/output params as :
>         @Param  IntHolder left;
>         @Param  IntHolder right;
>          @Output IntHolder out;

NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB