Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> how to get input schema in UDF


+
Danfeng Li 2012-08-13, 22:43
+
Robert Yerex 2012-08-13, 23:14
+
Danfeng Li 2012-08-14, 00:03
Copy link to this message
-
RE: how to get input schema in UDF
Ok, I found the solution

Replace
Schema tupleSchema = new Schema(input.getFields());
With
Schema tupleSchema = new Schema(input.getField(0).schema.getField(0).schema.getFields());

Will to the trick.

Thanks.
Dan

-----Original Message-----
From: Danfeng Li [mailto:[EMAIL PROTECTED]]
Sent: Monday, August 13, 2012 5:04 PM
To: [EMAIL PROTECTED]
Subject: RE: how to get input schema in UDF

Thanks, Robert.

However, I'm still not clear on how to get the original fields for the tuple inside the bag. Following is the code to generate the schema.

public Schema outputSchema(Schema input) {
   try{
      Schema.FieldSchema counter = new Schema.FieldSchema("counter", DataType.INTEGER);
      // here is my question, how do I get fields out of the original tuple inside the bag?
      // If I use the following line, I only get the BAG, not the tuple.
      Schema tupleSchema = new Schema(input.getFields());
      // After I get the original fields from the tuple, I can add the counter here
      tupleSchema.add(counter);

      Schema.FieldSchema tupleFs;
      tupleFs = new Schema.FieldSchema("with_counter", tupleSchema, DataType.TUPLE);

      Schema bagSchema = new Schema(tupleFs);
      return new Schema(new Schema.FieldSchema("row_counter",
                                                bagSchema, DataType.BAG));
     }catch (Exception e){
        return null;
     }
}

Thanks.
Dan

-----Original Message-----
From: Robert Yerex [mailto:[EMAIL PROTECTED]]
Sent: Monday, August 13, 2012 4:15 PM
To: [EMAIL PROTECTED]
Subject: Re: how to get input schema in UDF

Chapter 10 in Alan Gates' excellent book "Programmin Pig" discusses this issue.

Robert Yerex
Data Scientist
Civitas Leaning

On Mon, Aug 13, 2012 at 3:43 PM, Danfeng Li <[EMAIL PROTECTED]> wrote:

> I have a big, e.g. A: {(name: chararray,age: int)}, I wrote a udf
> which adds 1 more field in the tuple inside the bag. E.g. B: {(name:
> chararray,age: int, rank:int)}. Because the number of fields in the
> original bag is not fixed, e.g I can have one more field such as gender:int.
>
> In my udf, in order to generate the correct output schema, I need to
> get the input schema first. I tried to find some examples but
> couldn't, could someone show me how to do it?
>
> Thanks.
> Dan
>
>
--
Robert Yerex
Data Scientist
Civitas Learning
www.civitaslearning.com