|
|
-
Many to One UDF Problem
DIPESH KUMAR SINGH 2012-05-09, 19:00
(Yet another basic udf question)
I want my udf to take values of all the columns in a row.
For example: If there are 3 records in my input file. (Tab delimited row)
John 12 Jeff 33 Chin 20
Currently my UDF could only take one, (I don't know how to do more than one):
*register 'dudf.jar';** **player = load '/pig_data/dxmlsample1.txt' as (name:chararray, id:chararray);* *-- As i have only passed name here, I want whole row to be passed, i.e. name and id. (here)** **unintended = foreach player generate name, id, Dudf_try(name);** **dump unintended;*
My UDF code is:
*import java.io.IOException;** **import java.util.List;** **import java.util.ArrayList;** ** **import org.apache.pig.EvalFunc;** **import org.apache.pig.FuncSpec;** **import org.apache.pig.data.Tuple;** **import org.apache.pig.data.DataType;** **import org.apache.pig.impl.logicalLayer.schema.Schema;** **import org.apache.pig.impl.logicalLayer.FrontendException;** ** **public class Dudf_try extends EvalFunc<String> {** ** public String exec(Tuple input) throws IOException {** ** if(input == null || input.size() == 0)** ** return null;** ** try{** ** String query = (String)input.get(0);** ** //String query1 = (String)input.get(1);** ** ** ** // Some more transformation here , but ultimate Output is String** ** ** ** return query+"<>"+query1;** ** }catch(Exception e){** ** System.err.println("failed to process input; error - " + e.getMessage()); ** ** return null;** ** }** ** }** ** ** @Override** ** public Schema outputSchema(Schema input) {** ** return new Schema(new Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(), input), DataType.CHARARRAY));** ** }** ** ** @Override** ** public List<FuncSpec> getArgToFuncMapping() throws FrontendException {** ** List<FuncSpec> funcList = new ArrayList<FuncSpec>();** ** funcList.add(new FuncSpec(this.getClass().getName(), new Schema(new Schema.FieldSchema(null, DataType.CHARARRAY))));** ** ** return funcList;** ** }** ** **}* * *
I need some suggestion here on how to proceed with the intended.
Thanks! Dipesh
-- Dipesh Kr. Singh
-
Re: Many to One UDF Problem
Prashant Kommireddi 2012-05-09, 19:09
Dipesh,
You can pass in the entire tuple (row) to the UDF.
unintended = foreach player generate name, id, Dudf_try(*);
And the UDF now will be able to use the entire row :
Tuple tuple = (Tuple)input.get(0);
To process individual fields, you can iterate or positionally access the above tuple.
String name = tuple.get(0).toString(); String id = tuple.get(1).toString();
-Prashant On Wed, May 9, 2012 at 12:00 PM, DIPESH KUMAR SINGH <[EMAIL PROTECTED]>wrote:
> (Yet another basic udf question) > > I want my udf to take values of all the columns in a row. > > For example: If there are 3 records in my input file. (Tab delimited row) > > John 12 > Jeff 33 > Chin 20 > > Currently my UDF could only take one, (I don't know how to do more than > one): > > *register 'dudf.jar';** > **player = load '/pig_data/dxmlsample1.txt' as (name:chararray, > id:chararray);* > *-- As i have only passed name here, I want whole row to be passed, i.e. > name and id. (here)** > **unintended = foreach player generate name, id, Dudf_try(name);** > **dump unintended;* > > My UDF code is: > > *import java.io.IOException;** > **import java.util.List;** > **import java.util.ArrayList;** > ** > **import org.apache.pig.EvalFunc;** > **import org.apache.pig.FuncSpec;** > **import org.apache.pig.data.Tuple;** > **import org.apache.pig.data.DataType;** > **import org.apache.pig.impl.logicalLayer.schema.Schema;** > **import org.apache.pig.impl.logicalLayer.FrontendException;** > ** > **public class Dudf_try extends EvalFunc<String> {** > ** public String exec(Tuple input) throws IOException {** > ** if(input == null || input.size() == 0)** > ** return null;** > ** try{** > ** String query = (String)input.get(0);** > ** //String query1 = (String)input.get(1);** > ** ** > ** // Some more transformation here , but ultimate Output is String** > ** ** > ** return query+"<>"+query1;** > ** }catch(Exception e){** > ** System.err.println("failed to process input; error - " + > e.getMessage()); > ** > ** return null;** > ** }** > ** }** > ** > ** @Override** > ** public Schema outputSchema(Schema input) {** > ** return new Schema(new > Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(), > input), DataType.CHARARRAY));** > ** }** > ** > ** @Override** > ** public List<FuncSpec> getArgToFuncMapping() throws FrontendException {** > ** List<FuncSpec> funcList = new ArrayList<FuncSpec>();** > ** funcList.add(new FuncSpec(this.getClass().getName(), new Schema(new > Schema.FieldSchema(null, DataType.CHARARRAY))));** > ** > ** return funcList;** > ** }** > ** > **}* > * > * > > I need some suggestion here on how to proceed with the intended. > > Thanks! > Dipesh > > -- > Dipesh Kr. Singh >
-
Re: Many to One UDF Problem
DIPESH KUMAR SINGH 2012-05-10, 01:18
Prashant,
I followed as directed by you but i am getting the following error:
2012-05-10 06:40:38,097 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045: Could not infer the matching function for Dudf_try as multiple or none of them fit. Please use an explicit cast.
Detailed Error Log and code is is attached.
Thanks, Dipesh
On Thu, May 10, 2012 at 12:39 AM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote:
> Dipesh, > > You can pass in the entire tuple (row) to the UDF. > > unintended = foreach player generate name, id, Dudf_try(*); > > And the UDF now will be able to use the entire row : > > Tuple tuple = (Tuple)input.get(0); > > To process individual fields, you can iterate or positionally access the > above tuple. > > String name = tuple.get(0).toString(); > String id = tuple.get(1).toString(); > > -Prashant > > > On Wed, May 9, 2012 at 12:00 PM, DIPESH KUMAR SINGH > <[EMAIL PROTECTED]>wrote: > > > (Yet another basic udf question) > > > > I want my udf to take values of all the columns in a row. > > > > For example: If there are 3 records in my input file. (Tab delimited row) > > > > John 12 > > Jeff 33 > > Chin 20 > > > > Currently my UDF could only take one, (I don't know how to do more than > > one): > > > > *register 'dudf.jar';** > > **player = load '/pig_data/dxmlsample1.txt' as (name:chararray, > > id:chararray);* > > *-- As i have only passed name here, I want whole row to be passed, i.e. > > name and id. (here)** > > **unintended = foreach player generate name, id, Dudf_try(name);** > > **dump unintended;* > > > > My UDF code is: > > > > *import java.io.IOException;** > > **import java.util.List;** > > **import java.util.ArrayList;** > > ** > > **import org.apache.pig.EvalFunc;** > > **import org.apache.pig.FuncSpec;** > > **import org.apache.pig.data.Tuple;** > > **import org.apache.pig.data.DataType;** > > **import org.apache.pig.impl.logicalLayer.schema.Schema;** > > **import org.apache.pig.impl.logicalLayer.FrontendException;** > > ** > > **public class Dudf_try extends EvalFunc<String> {** > > ** public String exec(Tuple input) throws IOException {** > > ** if(input == null || input.size() == 0)** > > ** return null;** > > ** try{** > > ** String query = (String)input.get(0);** > > ** //String query1 = (String)input.get(1);** > > ** ** > > ** // Some more transformation here , but ultimate Output is String** > > ** ** > > ** return query+"<>"+query1;** > > ** }catch(Exception e){** > > ** System.err.println("failed to process input; error - " + > > e.getMessage()); > > ** > > ** return null;** > > ** }** > > ** }** > > ** > > ** @Override** > > ** public Schema outputSchema(Schema input) {** > > ** return new Schema(new > > Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(), > > input), DataType.CHARARRAY));** > > ** }** > > ** > > ** @Override** > > ** public List<FuncSpec> getArgToFuncMapping() throws FrontendException > {** > > ** List<FuncSpec> funcList = new ArrayList<FuncSpec>();** > > ** funcList.add(new FuncSpec(this.getClass().getName(), new Schema(new > > Schema.FieldSchema(null, DataType.CHARARRAY))));** > > ** > > ** return funcList;** > > ** }** > > ** > > **}* > > * > > * > > > > I need some suggestion here on how to proceed with the intended. > > > > Thanks! > > Dipesh > > > > -- > > Dipesh Kr. Singh > > >
-- Dipesh Kr. Singh
-
Re: Many to One UDF Problem
Prashant Kommireddi 2012-05-10, 01:39
public List<FuncSpec> getArgToFuncMapping() throws FrontendException needs to be modified accordingly, since you are now passing your UDF the entire tuple. You don't really need to implement it if there is no overloaded function.
Sent from my iPhone
On May 9, 2012, at 6:19 PM, DIPESH KUMAR SINGH <[EMAIL PROTECTED]> wrote:
> public List<FuncSpec> getArgToFuncMapping() throws FrontendException
-
Re: Many to One UDF Problem
DIPESH KUMAR SINGH 2012-05-10, 02:00
MapReduce job runs now, but string output of UDF is not coming. It shows something like this:
(Jeff,13,) (John,12,)
May be something needs to be changed in output schema, i was passing earlier:
@Override public Schema outputSchema(Schema input) { return new Schema(new Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(), input), DataType.CHARARRAY)); }
Thanks, Dipesh
On Thu, May 10, 2012 at 7:09 AM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote:
> public List<FuncSpec> getArgToFuncMapping() throws FrontendException > needs to be modified accordingly, since you are now passing your UDF > the entire tuple. You don't really need to implement it if there is no > overloaded function. > > Sent from my iPhone > > On May 9, 2012, at 6:19 PM, DIPESH KUMAR SINGH <[EMAIL PROTECTED]> > wrote: > > > public List<FuncSpec> getArgToFuncMapping() throws FrontendException >
-- Dipesh Kr. Singh
-
Re: Many to One UDF Problem
Russell Jurney 2012-05-10, 05:18
It seems you want to group your data, then feed this group into the UDF for processing. Look at SUM and AVG, I think, for examples? Russell Jurney http://datasyndrome.comOn May 9, 2012, at 7:01 PM, DIPESH KUMAR SINGH <[EMAIL PROTECTED]> wrote: > MapReduce job runs now, but string output of UDF is not coming. It shows > something > like this: > > (Jeff,13,) > (John,12,) > > May be something needs to be changed in output schema, i was passing > earlier: > > @Override > public Schema outputSchema(Schema input) { > return new Schema(new > Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(), > input), DataType.CHARARRAY)); > } > > Thanks, > Dipesh > > On Thu, May 10, 2012 at 7:09 AM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote: > >> public List<FuncSpec> getArgToFuncMapping() throws FrontendException >> needs to be modified accordingly, since you are now passing your UDF >> the entire tuple. You don't really need to implement it if there is no >> overloaded function. >> >> Sent from my iPhone >> >> On May 9, 2012, at 6:19 PM, DIPESH KUMAR SINGH <[EMAIL PROTECTED]> >> wrote: >> >>> public List<FuncSpec> getArgToFuncMapping() throws FrontendException >> > > > > -- > Dipesh Kr. Singh
-
Re: Many to One UDF Problem
Prashant Kommireddi 2012-05-10, 05:29
I messed up, your original UDF does not need to be changed.
Just pass in all fields (*) as I suggested in my previous email, and access them the way you were doing it before: String query = (String)input.get(0); String query1 = (String)input.get(1);
That should work.
-Prashant On Wed, May 9, 2012 at 7:00 PM, DIPESH KUMAR SINGH <[EMAIL PROTECTED]>wrote:
> MapReduce job runs now, but string output of UDF is not coming. It shows > something > like this: > > (Jeff,13,) > (John,12,) > > May be something needs to be changed in output schema, i was passing > earlier: > > @Override > public Schema outputSchema(Schema input) { > return new Schema(new > Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(), > input), DataType.CHARARRAY)); > } > > Thanks, > Dipesh > > On Thu, May 10, 2012 at 7:09 AM, Prashant Kommireddi <[EMAIL PROTECTED] > >wrote: > > > public List<FuncSpec> getArgToFuncMapping() throws FrontendException > > needs to be modified accordingly, since you are now passing your UDF > > the entire tuple. You don't really need to implement it if there is no > > overloaded function. > > > > Sent from my iPhone > > > > On May 9, 2012, at 6:19 PM, DIPESH KUMAR SINGH <[EMAIL PROTECTED]> > > wrote: > > > > > public List<FuncSpec> getArgToFuncMapping() throws FrontendException > > > > > > -- > Dipesh Kr. Singh >
-
Re: Many to One UDF Problem
DIPESH KUMAR SINGH 2012-05-10, 05:38
Thanks Prashant, Russel.
I was PigStorage while loading tab delimited file. Now its running fine.
Regards, Dipesh On May 10, 2012 11:00 AM, "Prashant Kommireddi" <[EMAIL PROTECTED]> wrote:
> I messed up, your original UDF does not need to be changed. > > Just pass in all fields (*) as I suggested in my previous email, and access > them the way you were doing it before: > String query = (String)input.get(0); > String query1 = (String)input.get(1); > > That should work. > > -Prashant > > > On Wed, May 9, 2012 at 7:00 PM, DIPESH KUMAR SINGH <[EMAIL PROTECTED] > >wrote: > > > MapReduce job runs now, but string output of UDF is not coming. It shows > > something > > like this: > > > > (Jeff,13,) > > (John,12,) > > > > May be something needs to be changed in output schema, i was passing > > earlier: > > > > @Override > > public Schema outputSchema(Schema input) { > > return new Schema(new > > Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(), > > input), DataType.CHARARRAY)); > > } > > > > Thanks, > > Dipesh > > > > On Thu, May 10, 2012 at 7:09 AM, Prashant Kommireddi < > [EMAIL PROTECTED] > > >wrote: > > > > > public List<FuncSpec> getArgToFuncMapping() throws FrontendException > > > needs to be modified accordingly, since you are now passing your UDF > > > the entire tuple. You don't really need to implement it if there is no > > > overloaded function. > > > > > > Sent from my iPhone > > > > > > On May 9, 2012, at 6:19 PM, DIPESH KUMAR SINGH <[EMAIL PROTECTED]> > > > wrote: > > > > > > > public List<FuncSpec> getArgToFuncMapping() throws FrontendException > > > > > > > > > > > -- > > Dipesh Kr. Singh > > >
|
|