Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> UDFs and types


Copy link to this message
-
RE: UDFs and types
Paolo [CC'ed] observed that currently, if the return type of the UDF is
a bag or a tuple, the contents of the bag/tuple is not known at type
checking time. In addition to the input parameter types, the return type
of the UDF should also be a schema. This will make the inputs and
outputs well defined and help the type checker enforce type checking and
promotion.

I found a paper that describes algorithms to do fast type inclusion
tests (if a type is a sub-type of another type).

http://www.cs.purdue.edu/homes/jv/pubs/oopsla97.pdf

Santhosh

-----Original Message-----
From: pi song [mailto:[EMAIL PROTECTED]]
Sent: Monday, July 07, 2008 5:58 AM
To: [EMAIL PROTECTED]
Subject: Re: UDFs and types

You're right. The real problem will be defining rules.

How about?
0) We do only non-nested types first.
1) All number types can be casted to bigger types
    int -> long -> float -> double
2) bytearray can be casted to chararray or double (chararray takes
precedance)
3) Matches on the left are more important than on the right. For
example:-

Input:-
(int, long)

Candidates:-
(int, float)
(float, long)

will match (int, float)

On Fri, Jul 4, 2008 at 1:42 AM, Benjamin Reed <[EMAIL PROTECTED]>
wrote:

> You rock Pi!
>
> It might be good to agree on best-fit rules. There are obvious ones:
int
> -> long, float -> double, but what about long -> int, long ->float,
and
> string -> float.
>
> There is also the recursive fits, which might be purely theoretical:
> tuples of the form (long, {float}) fit to (double, {long}) or (int,
> {long}). (That example might be invalid depending on the first answer,
> but hopefully you get the idea.)
>
> ben
>
> pi song wrote:
> > +1 Agree.
> >
> > I will try to make "best fit" happen in 24 hours after you commit
the new
> > UDF design.
> >
> >
> > On Thu, Jul 3, 2008 at 6:55 AM, Olga Natkovich <[EMAIL PROTECTED]>
> wrote:
> >
> >
> >> Sounds good to me.
> >>
> >> Olga
> >>
> >>
> >>> -----Original Message-----
> >>> From: Alan Gates [mailto:[EMAIL PROTECTED]]
> >>> Sent: Wednesday, July 02, 2008 1:44 PM
> >>> To: [EMAIL PROTECTED]
> >>> Subject: UDFs and types
> >>>
> >>> With the introduction of types (see
> >>> http://issues.apache.org/jira/browse/PIG-157) we need to
> >>> decide how EvalFunc will interact with the types.  The
> >>> original proposal was that the DEFINE keyword would be
> >>> modified to allow specification of types for the UDF.  This
> >>> has a couple of problems.  One, DEFINE is already used to
> >>> specify constructor arguments.  Using it to also specify
> >>> types will be confusing.  Two, it has been pointed out that
> >>> this type information is a property of the UDF and should
> >>> therefore be declared by the UDF, not in the script.
> >>>
> >>> Separately, as a way to allow simple function overloading, a
> >>> change had been proposed to the EvalFunc interface to allow
> >>> an EvalFunc to specify that for a given type, a different
> >>> instance of EvalFunc should be used (see
> >>> https://issues.apache.org/jira/browse/PIG-276).
> >>>
> >>> I would like to propose that we expand the changes in PIG-276
> >>> to be more general.  Rather than adding classForType() as
> >>> proposed in PIG-276, EvalFunc will instead add a function:
> >>>
> >>> public Map<Schema, FuncSpec> getArgToFuncMapping() {
> >>>     return null;
> >>> }
> >>>
> >>> Where FuncSpec is a new class that contains the name of the
> >>> class that implements the UDF along with any necessary
> >>> arguments for the constructor.
> >>>
> >>> The type checker will then, as part of type checking
> >>> LOUserFunc make a call to this function.  If it receives a
> >>> null, it will simply leave the UDF as is, and make the
> >>> assumption that the UDF can handle whatever datatype is being
> >>> provided to it.  This will cover most existing UDFs, which
> >>> will not override the default implementation.
> >>>
> >>> If a UDF wants to override the default, it should return a
> >>> map that gives a FuncSpec for each type of schema that it can
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB