|
|
-
how to get input schema in UDF
Danfeng Li 2012-08-13, 22:43
I have a big, e.g. A: {(name: chararray,age: int)}, I wrote a udf which adds 1 more field in the tuple inside the bag. E.g. B: {(name: chararray,age: int, rank:int)}. Because the number of fields in the original bag is not fixed, e.g I can have one more field such as gender:int.
In my udf, in order to generate the correct output schema, I need to get the input schema first. I tried to find some examples but couldn't, could someone show me how to do it?
Thanks. Dan
+
Danfeng Li 2012-08-13, 22:43
-
Re: how to get input schema in UDF
Robert Yerex 2012-08-13, 23:14
Chapter 10 in Alan Gates' excellent book "Programmin Pig" discusses this issue.
Robert Yerex Data Scientist Civitas Leaning
On Mon, Aug 13, 2012 at 3:43 PM, Danfeng Li <[EMAIL PROTECTED]> wrote:
> I have a big, e.g. A: {(name: chararray,age: int)}, I wrote a udf which > adds 1 more field in the tuple inside the bag. E.g. B: {(name: > chararray,age: int, rank:int)}. Because the number of fields in the > original bag is not fixed, e.g I can have one more field such as gender:int. > > In my udf, in order to generate the correct output schema, I need to get > the input schema first. I tried to find some examples but couldn't, could > someone show me how to do it? > > Thanks. > Dan > > -- Robert Yerex Data Scientist Civitas Learning www.civitaslearning.com
+
Robert Yerex 2012-08-13, 23:14
-
RE: how to get input schema in UDF
Danfeng Li 2012-08-14, 00:03
Thanks, Robert.
However, I'm still not clear on how to get the original fields for the tuple inside the bag. Following is the code to generate the schema.
public Schema outputSchema(Schema input) { try{ Schema.FieldSchema counter = new Schema.FieldSchema("counter", DataType.INTEGER); // here is my question, how do I get fields out of the original tuple inside the bag? // If I use the following line, I only get the BAG, not the tuple. Schema tupleSchema = new Schema(input.getFields()); // After I get the original fields from the tuple, I can add the counter here tupleSchema.add(counter);
Schema.FieldSchema tupleFs; tupleFs = new Schema.FieldSchema("with_counter", tupleSchema, DataType.TUPLE);
Schema bagSchema = new Schema(tupleFs); return new Schema(new Schema.FieldSchema("row_counter", bagSchema, DataType.BAG)); }catch (Exception e){ return null; } }
Thanks. Dan
-----Original Message----- From: Robert Yerex [mailto:[EMAIL PROTECTED]] Sent: Monday, August 13, 2012 4:15 PM To: [EMAIL PROTECTED] Subject: Re: how to get input schema in UDF
Chapter 10 in Alan Gates' excellent book "Programmin Pig" discusses this issue.
Robert Yerex Data Scientist Civitas Leaning
On Mon, Aug 13, 2012 at 3:43 PM, Danfeng Li <[EMAIL PROTECTED]> wrote:
> I have a big, e.g. A: {(name: chararray,age: int)}, I wrote a udf > which adds 1 more field in the tuple inside the bag. E.g. B: {(name: > chararray,age: int, rank:int)}. Because the number of fields in the > original bag is not fixed, e.g I can have one more field such as gender:int. > > In my udf, in order to generate the correct output schema, I need to > get the input schema first. I tried to find some examples but > couldn't, could someone show me how to do it? > > Thanks. > Dan > > -- Robert Yerex Data Scientist Civitas Learning www.civitaslearning.com
+
Danfeng Li 2012-08-14, 00:03
-
RE: how to get input schema in UDF
Danfeng Li 2012-08-14, 01:08
Ok, I found the solution
Replace Schema tupleSchema = new Schema(input.getFields()); With Schema tupleSchema = new Schema(input.getField(0).schema.getField(0).schema.getFields());
Will to the trick.
Thanks. Dan
-----Original Message----- From: Danfeng Li [mailto:[EMAIL PROTECTED]] Sent: Monday, August 13, 2012 5:04 PM To: [EMAIL PROTECTED] Subject: RE: how to get input schema in UDF
Thanks, Robert.
However, I'm still not clear on how to get the original fields for the tuple inside the bag. Following is the code to generate the schema.
public Schema outputSchema(Schema input) { try{ Schema.FieldSchema counter = new Schema.FieldSchema("counter", DataType.INTEGER); // here is my question, how do I get fields out of the original tuple inside the bag? // If I use the following line, I only get the BAG, not the tuple. Schema tupleSchema = new Schema(input.getFields()); // After I get the original fields from the tuple, I can add the counter here tupleSchema.add(counter);
Schema.FieldSchema tupleFs; tupleFs = new Schema.FieldSchema("with_counter", tupleSchema, DataType.TUPLE);
Schema bagSchema = new Schema(tupleFs); return new Schema(new Schema.FieldSchema("row_counter", bagSchema, DataType.BAG)); }catch (Exception e){ return null; } }
Thanks. Dan
-----Original Message----- From: Robert Yerex [mailto:[EMAIL PROTECTED]] Sent: Monday, August 13, 2012 4:15 PM To: [EMAIL PROTECTED] Subject: Re: how to get input schema in UDF
Chapter 10 in Alan Gates' excellent book "Programmin Pig" discusses this issue.
Robert Yerex Data Scientist Civitas Leaning
On Mon, Aug 13, 2012 at 3:43 PM, Danfeng Li <[EMAIL PROTECTED]> wrote:
> I have a big, e.g. A: {(name: chararray,age: int)}, I wrote a udf > which adds 1 more field in the tuple inside the bag. E.g. B: {(name: > chararray,age: int, rank:int)}. Because the number of fields in the > original bag is not fixed, e.g I can have one more field such as gender:int. > > In my udf, in order to generate the correct output schema, I need to > get the input schema first. I tried to find some examples but > couldn't, could someone show me how to do it? > > Thanks. > Dan > > -- Robert Yerex Data Scientist Civitas Learning www.civitaslearning.com
+
Danfeng Li 2012-08-14, 01:08
|
|