Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Many to One UDF Problem


Copy link to this message
-
Re: Many to One UDF Problem
Dipesh,

You can pass in the entire tuple (row) to the UDF.

unintended = foreach player generate name, id, Dudf_try(*);

And the UDF now will be able to use the entire row :

Tuple tuple = (Tuple)input.get(0);

To process individual fields, you can iterate or positionally access the
above tuple.

String name = tuple.get(0).toString();
String id = tuple.get(1).toString();

-Prashant
On Wed, May 9, 2012 at 12:00 PM, DIPESH KUMAR SINGH
<[EMAIL PROTECTED]>wrote:

> (Yet another basic udf question)
>
> I want my udf to take values of all the columns in a row.
>
> For example: If there are 3 records in my input file. (Tab delimited row)
>
> John   12
> Jeff     33
> Chin    20
>
> Currently my UDF could only take one, (I don't know how to do more than
> one):
>
> *register 'dudf.jar';**
> **player = load '/pig_data/dxmlsample1.txt' as (name:chararray,
> id:chararray);*
> *-- As i have only passed name here, I want whole row to be passed, i.e.
> name and id. (here)**
> **unintended = foreach player generate name, id, Dudf_try(name);**
> **dump unintended;*
>
> My UDF code is:
>
> *import java.io.IOException;**
> **import java.util.List;**
> **import java.util.ArrayList;**
> **
> **import org.apache.pig.EvalFunc;**
> **import org.apache.pig.FuncSpec;**
> **import org.apache.pig.data.Tuple;**
> **import org.apache.pig.data.DataType;**
> **import org.apache.pig.impl.logicalLayer.schema.Schema;**
> **import org.apache.pig.impl.logicalLayer.FrontendException;**
> **
> **public class Dudf_try extends EvalFunc<String> {**
> ** public String exec(Tuple input) throws IOException {**
> ** if(input == null || input.size() == 0)**
> ** return null;**
> ** try{**
> ** String query = (String)input.get(0);**
> **  //String query1 = (String)input.get(1);**
> ** **
> ** // Some more transformation here , but ultimate Output is String**
> ** **
> ** return query+"<>"+query1;**
> ** }catch(Exception e){**
> ** System.err.println("failed to process input; error - " +
> e.getMessage());
> **
> ** return null;**
> ** }**
> ** }**
> **
> ** @Override**
> ** public Schema outputSchema(Schema input) {**
> ** return new Schema(new
> Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(),
> input), DataType.CHARARRAY));**
> ** }**
> **
> ** @Override**
> ** public List<FuncSpec> getArgToFuncMapping() throws FrontendException {**
> ** List<FuncSpec> funcList = new ArrayList<FuncSpec>();**
> ** funcList.add(new FuncSpec(this.getClass().getName(), new Schema(new
> Schema.FieldSchema(null, DataType.CHARARRAY))));**
> **
> ** return funcList;**
> ** }**
> **
> **}*
> *
> *
>
> I need some suggestion here on how to proceed with the intended.
>
> Thanks!
> Dipesh
>
> --
> Dipesh Kr. Singh
>