Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Simple UDF to return array


Copy link to this message
-
Re: Simple UDF to return array
Hi Sunita,
yes, it's definitely possible and you should use Generic UDFs.
I wrote one UDF that takes n arrays (each one with the same number of
elements) and returns an array of structs which is usually used in a
lateral view.

A good article on how to write a generic UDF is this one:
http://www.baynote.com/2012/11/a-word-from-the-engineers/
On Thu, Jan 30, 2014 at 7:06 AM, Sunita Arvind <[EMAIL PROTECTED]>wrote:

> Can someone please suggest if this is doable or not? Is generic udf the
> only option? How would using generic vs simple udf make any difference
> since I would be returning the same object either ways.
>
> Thank you
> Sunita
>
> ---------- Forwarded message ----------
> From: *Sunita Arvind* <[EMAIL PROTECTED]>
> Date: Wednesday, January 29, 2014
> Subject: Simple UDF to return array
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>
>
> Hello Experts,
>
> I am trying to write a UDF to parse a logline and provide the output in
> the form of an array. Basically I want to be able to use LATERAL VIEW
> explode subsequently to make it into columns.
>
> This is how a typical log entry looks:
>
> 24-JUN-2012 05:00:42 * (CONNECT_DATA=(SERVICE_NAME=abcd.efg.hij.com)(failover_mode=(type=select)(method=basic))(CID=(PROGRAM=sqlplus)(HOST=xyz)(USER=u1))(SERVER=dedicated)(INSTANCE_NAME=aaa))
> * (ADDRESS=(PROTOCOL=tcp)(HOST=9.9.9.9)(PORT=60000)) * establish *
> abcd.efg.hij.com * 0
>
> Attached is my LogParser class which is basically the UDF. Excerpts below:
>
> class LogParser extends UDF {
>   int current_index=0;
>
>   ArrayList<String> record= new ArrayList<>();
>   public ArrayList<String> evaluate(Text input) {
> ......
> String  logdate = null;
> ...
> logdate = getDate(line);
> record.add(logdate);
> return record;
>
>
> I've tried changing the return type to ArrayList<Text>, Object etc.I just
> get an error like this when I try to use the UDF:
>
> select explode(strparse(record)) as newcols from logdump limit 1;
>
> OK converting to local hdfs://tlbd-ns/user/TestUser1/LogParserStrArr.jar
> Added
> /tmp/3c583384-0592-41a3-ad9e-b12d2207df7b_resources/LogParserStrArr.jar to
> class path Added resource:
> /tmp/3c583384-0592-41a3-ad9e-b12d2207df7b_resources/LogParserStrArr.jar OK
> FAILED: UDFArgumentException explode() takes an array or a map as a
> parameter
>
> I tried cast to array and that fails as well.
>
> Requesting help from the community. I am considering writing generic UDF,
> but this is a simple requirement and would like to be able to use simple
> UDF if I can.
>
> regards
> Sunita
>
>
>
--
----------------------------------------------------------
Good judgement comes with experience.
Experience comes with bad judgement.
----------------------------------------------------------
Roberto Congiu - Data Engineer - OpenX
tel: +1 626 466 1141