Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Simple UDF to return array


Copy link to this message
-
Re: Simple UDF to return array
Hi Sunita,
yes, it's definitely possible and you should use Generic UDFs.
I wrote one UDF that takes n arrays (each one with the same number of
elements) and returns an array of structs which is usually used in a
lateral view.

A good article on how to write a generic UDF is this one:
http://www.baynote.com/2012/11/a-word-from-the-engineers/
On Thu, Jan 30, 2014 at 7:06 AM, Sunita Arvind <[EMAIL PROTECTED]>wrote:

> Can someone please suggest if this is doable or not? Is generic udf the
> only option? How would using generic vs simple udf make any difference
> since I would be returning the same object either ways.
>
> Thank you
> Sunita
>
> ---------- Forwarded message ----------
> From: *Sunita Arvind* <[EMAIL PROTECTED]>
> Date: Wednesday, January 29, 2014
> Subject: Simple UDF to return array
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>
>
> Hello Experts,
>
> I am trying to write a UDF to parse a logline and provide the output in
> the form of an array. Basically I want to be able to use LATERAL VIEW
> explode subsequently to make it into columns.
>
> This is how a typical log entry looks:
>
> 24-JUN-2012 05:00:42 * (CONNECT_DATA=(SERVICE_NAME=abcd.efg.hij.com)(failover_mode=(type=select)(method=basic))(CID=(PROGRAM=sqlplus)(HOST=xyz)(USER=u1))(SERVER=dedicated)(INSTANCE_NAME=aaa))
> * (ADDRESS=(PROTOCOL=tcp)(HOST=9.9.9.9)(PORT=60000)) * establish *
> abcd.efg.hij.com * 0
>
> Attached is my LogParser class which is basically the UDF. Excerpts below:
>
> class LogParser extends UDF {
>   int current_index=0;
>
>   ArrayList<String> record= new ArrayList<>();
>   public ArrayList<String> evaluate(Text input) {
> ......
> String  logdate = null;
> ...
> logdate = getDate(line);
> record.add(logdate);
> return record;
>
>
> I've tried changing the return type to ArrayList<Text>, Object etc.I just
> get an error like this when I try to use the UDF:
>
> select explode(strparse(record)) as newcols from logdump limit 1;
>
> OK converting to local hdfs://tlbd-ns/user/TestUser1/LogParserStrArr.jar
> Added
> /tmp/3c583384-0592-41a3-ad9e-b12d2207df7b_resources/LogParserStrArr.jar to
> class path Added resource:
> /tmp/3c583384-0592-41a3-ad9e-b12d2207df7b_resources/LogParserStrArr.jar OK
> FAILED: UDFArgumentException explode() takes an array or a map as a
> parameter
>
> I tried cast to array and that fails as well.
>
> Requesting help from the community. I am considering writing generic UDF,
> but this is a simple requirement and would like to be able to use simple
> UDF if I can.
>
> regards
> Sunita
>
>
>
--
----------------------------------------------------------
Good judgement comes with experience.
Experience comes with bad judgement.
----------------------------------------------------------
Roberto Congiu - Data Engineer - OpenX
tel: +1 626 466 1141

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB