|
|
-
dereference bag of tuples of fields
Rodriguez, John 2010-07-30, 22:10
I have built a bag tuples where the tuples contain fields.
I am reading SequenceFiles and have reading MyLoader to do this. I created a subset of all the fields, "isValid" to make the example simpler.
I am not sure how to apply a dereference operator to this?
A = LOAD '/data/NetFlowDigests/rk/DigestMessage/part-r-00000' using MyLoader() AS (data: bag{t: tuple(isValid:int)});
DESCRIBE A;
A: {data: {t: (isValid: int)}}
So all the ways that I have tried to dereference have syntax errors.
B = GROUP A BY (data.t);
2010-07-30 21:51:29,881 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only access to the elements of the tuple in the bag is allowed.
B = GROUP A BY (data.t.isValid);
2010-07-30 21:54:11,157 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only access to the elements of the tuple in the bag is allowed.
B = GROUP A BY (t.isValid);
2010-07-30 21:55:31,475 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: t in {data: {t: (isValid: int)}}
What is the proper way to do this?
John Rodriguez
-
Re: dereference bag of tuples of fields
Thejas M Nair 2010-07-30, 22:38
Can you given an example of your data, and what output you want from the pig query ?
That will help me understand what you want the query to do . From the schema and query, that is not very clear to me.
-Thejas
On 7/30/10 3:10 PM, "Rodriguez, John" <[EMAIL PROTECTED]> wrote:
I have built a bag tuples where the tuples contain fields.
I am reading SequenceFiles and have reading MyLoader to do this. I created a subset of all the fields, "isValid" to make the example simpler.
I am not sure how to apply a dereference operator to this?
A = LOAD '/data/NetFlowDigests/rk/DigestMessage/part-r-00000' using MyLoader() AS (data: bag{t: tuple(isValid:int)});
DESCRIBE A;
A: {data: {t: (isValid: int)}}
So all the ways that I have tried to dereference have syntax errors.
B = GROUP A BY (data.t);
2010-07-30 21:51:29,881 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only access to the elements of the tuple in the bag is allowed.
B = GROUP A BY (data.t.isValid);
2010-07-30 21:54:11,157 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only access to the elements of the tuple in the bag is allowed.
B = GROUP A BY (t.isValid);
2010-07-30 21:55:31,475 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: t in {data: {t: (isValid: int)}}
What is the proper way to do this?
John Rodriguez
-
RE: dereference bag of tuples of fields
Rodriguez, John 2010-07-31, 16:35
This is not exactly what my data is but it is a small example I saw in the reference manual 2. This may help describe what I am trying to do. http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#deref The LOAD statement in the example below is the same as what I use for my data. I only modified the data file to demonstrate multiple tuples in a bag. grunt> cat data {(1,1,1)} {(2,2,2)(3,3,3)} {(4,4,4)(5,5,5)(6,6,6)} grunt> A = LOAD 'data' AS (B: bag {T: tuple(t1:int, t2:int, t3:int)}); grunt> DESCRIBE A; A: {B: {T: (t1: int,t2: int,t3: int)}} grunt> X = FOREACH A GENERATE B.T.t1; 2010-07-31 16:09:46,659 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1028: Access to the tuple (T) of the bag is disallowed. Only access to the elements of the tuple in the bag is allowed. So I cannot dereference as, "B.T.t1" ? Maybe dereference operators do not work unless it is 2 levels, e.g. tuple.field ? However, if the reference manual gives this as an example, then how are the fields referenced for access in other relations? This is perhaps a separate question but here is what I have done. Maybe there is a simpler way to represent this in Pig. 1) Read SequenceFiles with a loadfunc 2) The SequenceFiles have data in the value that is an "array of fields" in Java 3) I thought that a Pig "bag of tuples" would be equivalent to a Java "array of fields" The loadfunc "getNext" only allows returning a tuple (not a bag). So what I do in "getNext" is: a) For each element of the Java array of fields, build a tuple that has those fields from Java b) Add the tuples to a bag c) Add the bag to a tuple and return that tuple from getNext Thanks, John Rodriguez Can you given an example of your data, and what output you want from the pig query ? That will help me understand what you want the query to do . From the schema and query, that is not very clear to me. -Thejas On 7/30/10 3:10 PM, "Rodriguez, John" <[EMAIL PROTECTED]> wrote: I have built a bag tuples where the tuples contain fields. I am reading SequenceFiles and have reading MyLoader to do this. I created a subset of all the fields, "isValid" to make the example simpler. I am not sure how to apply a dereference operator to this? A = LOAD '/data/NetFlowDigests/rk/DigestMessage/part-r-00000' using MyLoader() AS (data: bag{t: tuple(isValid:int)}); DESCRIBE A; A: {data: {t: (isValid: int)}} So all the ways that I have tried to dereference have syntax errors. B = GROUP A BY (data.t); 2010-07-30 21:51:29,881 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only access to the elements of the tuple in the bag is allowed. B = GROUP A BY (data.t.isValid); 2010-07-30 21:54:11,157 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only access to the elements of the tuple in the bag is allowed. B = GROUP A BY (t.isValid); 2010-07-30 21:55:31,475 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: t in {data: {t: (isValid: int)}} What is the proper way to do this? John Rodriguez
-
Re: dereference bag of tuples of fields
Scott Carey 2010-07-31, 16:39
data.isValid
All bags are bags of tuples. The tuple is intrinsic and invisible at the syntax level - its visible to udfs though. If you nest one more tuple in that nested tuple pig gets confused. So 'bag.field' is actually a double dereference - one for the bag and one for the intrinsic tuple.
----- Reply message ----- From: "Rodriguez, John" <[EMAIL PROTECTED]> Date: Fri, Jul 30, 2010 3:11 pm Subject: dereference bag of tuples of fields To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
I have built a bag tuples where the tuples contain fields.
I am reading SequenceFiles and have reading MyLoader to do this. I created a subset of all the fields, "isValid" to make the example simpler.
I am not sure how to apply a dereference operator to this?
A = LOAD '/data/NetFlowDigests/rk/DigestMessage/part-r-00000' using MyLoader() AS (data: bag{t: tuple(isValid:int)});
DESCRIBE A;
A: {data: {t: (isValid: int)}}
So all the ways that I have tried to dereference have syntax errors.
B = GROUP A BY (data.t);
2010-07-30 21:51:29,881 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only access to the elements of the tuple in the bag is allowed.
B = GROUP A BY (data.t.isValid);
2010-07-30 21:54:11,157 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only access to the elements of the tuple in the bag is allowed.
B = GROUP A BY (t.isValid);
2010-07-30 21:55:31,475 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: t in {data: {t: (isValid: int)}}
What is the proper way to do this?
John Rodriguez
-
RE: dereference bag of tuples of fields
Rodriguez, John 2010-08-01, 14:48
Does this mean there is no way to access the fields t1, t2, t3?
cat data
{(1,1,1)}
{(2,2,2)(3,3,3)}
{(4,4,4)(5,5,5)(6,6,6)}
A = LOAD 'data' AS (B: bag {T: tuple(t1:int, t2:int, t3:int)});
From: Scott Carey [mailto:[EMAIL PROTECTED]] Sent: Saturday, July 31, 2010 9:39 AM To: [EMAIL PROTECTED]; Rodriguez, John Subject: Re: dereference bag of tuples of fields
data.isValid
All bags are bags of tuples. The tuple is intrinsic and invisible at the syntax level - its visible to udfs though. If you nest one more tuple in that nested tuple pig gets confused. So 'bag.field' is actually a double dereference - one for the bag and one for the intrinsic tuple.
----- Reply message ----- From: "Rodriguez, John" <[EMAIL PROTECTED]> Date: Fri, Jul 30, 2010 3:11 pm Subject: dereference bag of tuples of fields To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
I have built a bag tuples where the tuples contain fields.
I am reading SequenceFiles and have reading MyLoader to do this. I created a subset of all the fields, "isValid" to make the example simpler.
I am not sure how to apply a dereference operator to this?
A = LOAD '/data/NetFlowDigests/rk/DigestMessage/part-r-00000' using MyLoader() AS (data: bag{t: tuple(isValid:int)});
DESCRIBE A;
A: {data: {t: (isValid: int)}}
So all the ways that I have tried to dereference have syntax errors.
B = GROUP A BY (data.t);
2010-07-30 21:51:29,881 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only access to the elements of the tuple in the bag is allowed.
B = GROUP A BY (data.t.isValid);
2010-07-30 21:54:11,157 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only access to the elements of the tuple in the bag is allowed.
B = GROUP A BY (t.isValid);
2010-07-30 21:55:31,475 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: t in {data: {t: (isValid: int)}}
What is the proper way to do this?
John Rodriguez
-
Re: dereference bag of tuples of fields
Ashutosh Chauhan 2010-08-01, 19:18
If you are loading data through PigStorage (which will be used if you dont specify any) then there should be a comma separating tuples in the bag, so your data should look like
cat data {(1,1,1)} {(2,2,2),(3,3,3)} {(4,4,4),(5,5,5),(6,6,6)}
then grunt> A = LOAD 'data' AS (B: bag {T: tuple(t1:int, t2:int, t3:int)}); grunt> C = foreach A generate B.t1, B.t2, B.t3; grunt> dump C;
{(1)},{(1)},{(1)}) ({(2),(3)},{(2),(3)},{(2),(3)}) ({(4),(5),(6)},{(4),(5),(6)},{(4),(5),(6)}) Ashutosh On Sun, Aug 1, 2010 at 07:48, Rodriguez, John <[EMAIL PROTECTED]> wrote: > Does this mean there is no way to access the fields t1, t2, t3? > > > > cat data > > {(1,1,1)} > > {(2,2,2)(3,3,3)} > > {(4,4,4)(5,5,5)(6,6,6)} > > A = LOAD 'data' AS (B: bag {T: tuple(t1:int, t2:int, t3:int)}); > > > > > > From: Scott Carey [mailto:[EMAIL PROTECTED]] > Sent: Saturday, July 31, 2010 9:39 AM > To: [EMAIL PROTECTED]; Rodriguez, John > Subject: Re: dereference bag of tuples of fields > > > > data.isValid > > All bags are bags of tuples. The tuple is intrinsic and invisible at > the syntax level - its visible to udfs though. If you nest one more > tuple in that nested tuple pig gets confused. So 'bag.field' is > actually a double dereference - one for the bag and one for the > intrinsic tuple. > > ----- Reply message ----- > From: "Rodriguez, John" <[EMAIL PROTECTED]> > Date: Fri, Jul 30, 2010 3:11 pm > Subject: dereference bag of tuples of fields > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > > I have built a bag tuples where the tuples contain fields. > > > > I am reading SequenceFiles and have reading MyLoader to do this. I > created a subset of all the fields, "isValid" to make the example > simpler. > > > > I am not sure how to apply a dereference operator to this? > > > > A = LOAD '/data/NetFlowDigests/rk/DigestMessage/part-r-00000' using > MyLoader() AS (data: bag{t: tuple(isValid:int)}); > > DESCRIBE A; > > A: {data: {t: (isValid: int)}} > > > > So all the ways that I have tried to dereference have syntax errors. > > > > B = GROUP A BY (data.t); > > 2010-07-30 21:51:29,881 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only > access to the elements of the tuple in the bag is allowed. > > > > B = GROUP A BY (data.t.isValid); > > 2010-07-30 21:54:11,157 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only > access to the elements of the tuple in the bag is allowed. > > > > B = GROUP A BY (t.isValid); > > 2010-07-30 21:55:31,475 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 1000: Error during parsing. Invalid alias: t in {data: {t: > (isValid: int)}} > > > > What is the proper way to do this? > > > > John Rodriguez > > > >
-
RE: dereference bag of tuples of fields
Rodriguez, John 2010-08-02, 17:04
Thanks all, for your help.
The "generate" syntax you showed now works.
But if I do this:
X = GROUP A BY B.t1; DUMP X;
Then I get the following error:
2010-08-02 09:48:54,064 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1068: Using Bag as key not supported.
Pig Stack Trace --------------- ERROR 1068: Using Bag as key not supported.
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open i terator for alias X at org.apache.pig.PigServer.openIterator(PigServer.java:521) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:5 44) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScript Parser.java:241) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.j ava:162) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.j ava:138) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:357) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unabl e to store alias X at org.apache.pig.PigServer.store(PigServer.java:577) at org.apache.pig.PigServer.openIterator(PigServer.java:504) ... 6 more
-----Original Message----- From: Ashutosh Chauhan [mailto:[EMAIL PROTECTED]] Sent: Sunday, August 01, 2010 12:19 PM To: [EMAIL PROTECTED] Subject: Re: dereference bag of tuples of fields
If you are loading data through PigStorage (which will be used if you dont specify any) then there should be a comma separating tuples in the bag, so your data should look like
cat data {(1,1,1)} {(2,2,2),(3,3,3)} {(4,4,4),(5,5,5),(6,6,6)}
then grunt> A = LOAD 'data' AS (B: bag {T: tuple(t1:int, t2:int, t3:int)}); grunt> C = foreach A generate B.t1, B.t2, B.t3; grunt> dump C;
{(1)},{(1)},{(1)}) ({(2),(3)},{(2),(3)},{(2),(3)}) ({(4),(5),(6)},{(4),(5),(6)},{(4),(5),(6)}) Ashutosh On Sun, Aug 1, 2010 at 07:48, Rodriguez, John <[EMAIL PROTECTED]> wrote: > Does this mean there is no way to access the fields t1, t2, t3? > > > > cat data > > {(1,1,1)} > > {(2,2,2)(3,3,3)} > > {(4,4,4)(5,5,5)(6,6,6)} > > A = LOAD 'data' AS (B: bag {T: tuple(t1:int, t2:int, t3:int)}); > > > > > > From: Scott Carey [mailto:[EMAIL PROTECTED]] > Sent: Saturday, July 31, 2010 9:39 AM > To: [EMAIL PROTECTED]; Rodriguez, John > Subject: Re: dereference bag of tuples of fields > > > > data.isValid > > All bags are bags of tuples. The tuple is intrinsic and invisible at > the syntax level - its visible to udfs though. If you nest one more > tuple in that nested tuple pig gets confused. So 'bag.field' is > actually a double dereference - one for the bag and one for the > intrinsic tuple. > > ----- Reply message ----- > From: "Rodriguez, John" <[EMAIL PROTECTED]> > Date: Fri, Jul 30, 2010 3:11 pm > Subject: dereference bag of tuples of fields > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > > I have built a bag tuples where the tuples contain fields. > > > > I am reading SequenceFiles and have reading MyLoader to do this. I > created a subset of all the fields, "isValid" to make the example > simpler. > > > > I am not sure how to apply a dereference operator to this? > > > > A = LOAD '/data/NetFlowDigests/rk/DigestMessage/part-r-00000' using > MyLoader() AS (data: bag{t: tuple(isValid:int)}); > > DESCRIBE A; > > A: {data: {t: (isValid: int)}} > > > > So all the ways that I have tried to dereference have syntax errors. > > > > B = GROUP A BY (data.t); > > 2010-07-30 21:51:29,881 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only > access to the elements of the tuple in the bag is allowed. > > > > B = GROUP A BY (data.t.isValid); > > 2010-07-30 21:54:11,157 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only > access to the elements of the tuple in the bag is allowed.
-
RE: dereference bag of tuples of fields
Rodriguez, John 2010-08-02, 19:35
And an expression like in the GENERATE does not work with a bag dereference.
grunt> A = LOAD 'data' AS (B: bag {T: tuple(t1:int, t2:int, t3:int)}); grunt> C = FOREACH A GENERATE B.t1 - B.t2; grunt> dump C; 2010-08-02 19:32:41,226 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1039: In alias C, incompatible types in Subtract Operator left hand side:bag right hand side:bag
-----Original Message----- From: Ashutosh Chauhan [mailto:[EMAIL PROTECTED]] Sent: Sunday, August 01, 2010 12:19 PM To: [EMAIL PROTECTED] Subject: Re: dereference bag of tuples of fields
If you are loading data through PigStorage (which will be used if you dont specify any) then there should be a comma separating tuples in the bag, so your data should look like
cat data {(1,1,1)} {(2,2,2),(3,3,3)} {(4,4,4),(5,5,5),(6,6,6)}
then grunt> A = LOAD 'data' AS (B: bag {T: tuple(t1:int, t2:int, t3:int)}); grunt> C = foreach A generate B.t1, B.t2, B.t3; grunt> dump C;
{(1)},{(1)},{(1)}) ({(2),(3)},{(2),(3)},{(2),(3)}) ({(4),(5),(6)},{(4),(5),(6)},{(4),(5),(6)}) Ashutosh On Sun, Aug 1, 2010 at 07:48, Rodriguez, John <[EMAIL PROTECTED]> wrote: > Does this mean there is no way to access the fields t1, t2, t3? > > > > cat data > > {(1,1,1)} > > {(2,2,2)(3,3,3)} > > {(4,4,4)(5,5,5)(6,6,6)} > > A = LOAD 'data' AS (B: bag {T: tuple(t1:int, t2:int, t3:int)}); > > > > > > From: Scott Carey [mailto:[EMAIL PROTECTED]] > Sent: Saturday, July 31, 2010 9:39 AM > To: [EMAIL PROTECTED]; Rodriguez, John > Subject: Re: dereference bag of tuples of fields > > > > data.isValid > > All bags are bags of tuples. The tuple is intrinsic and invisible at > the syntax level - its visible to udfs though. If you nest one more > tuple in that nested tuple pig gets confused. So 'bag.field' is > actually a double dereference - one for the bag and one for the > intrinsic tuple. > > ----- Reply message ----- > From: "Rodriguez, John" <[EMAIL PROTECTED]> > Date: Fri, Jul 30, 2010 3:11 pm > Subject: dereference bag of tuples of fields > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > > I have built a bag tuples where the tuples contain fields. > > > > I am reading SequenceFiles and have reading MyLoader to do this. I > created a subset of all the fields, "isValid" to make the example > simpler. > > > > I am not sure how to apply a dereference operator to this? > > > > A = LOAD '/data/NetFlowDigests/rk/DigestMessage/part-r-00000' using > MyLoader() AS (data: bag{t: tuple(isValid:int)}); > > DESCRIBE A; > > A: {data: {t: (isValid: int)}} > > > > So all the ways that I have tried to dereference have syntax errors. > > > > B = GROUP A BY (data.t); > > 2010-07-30 21:51:29,881 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only > access to the elements of the tuple in the bag is allowed. > > > > B = GROUP A BY (data.t.isValid); > > 2010-07-30 21:54:11,157 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only > access to the elements of the tuple in the bag is allowed. > > > > B = GROUP A BY (t.isValid); > > 2010-07-30 21:55:31,475 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 1000: Error during parsing. Invalid alias: t in {data: {t: > (isValid: int)}} > > > > What is the proper way to do this? > > > > John Rodriguez > > > >
-
Re: dereference bag of tuples of fields
Xiaomeng Wan 2010-08-03, 17:16
try FLATTEN the loaded bags first.
On Mon, Aug 2, 2010 at 1:35 PM, Rodriguez, John <[EMAIL PROTECTED]> wrote: > And an expression like in the GENERATE does not work with a bag dereference. > > grunt> A = LOAD 'data' AS (B: bag {T: tuple(t1:int, t2:int, t3:int)}); > grunt> C = FOREACH A GENERATE B.t1 - B.t2; > grunt> dump C; > 2010-08-02 19:32:41,226 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1039: In alias C, incompatible types in Subtract Operator left hand side:bag right hand side:bag > > -----Original Message----- > From: Ashutosh Chauhan [mailto:[EMAIL PROTECTED]] > Sent: Sunday, August 01, 2010 12:19 PM > To: [EMAIL PROTECTED] > Subject: Re: dereference bag of tuples of fields > > If you are loading data through PigStorage (which will be used if you > dont specify any) then there should be a comma separating tuples in > the bag, so your data should look like > > cat data > {(1,1,1)} > {(2,2,2),(3,3,3)} > {(4,4,4),(5,5,5),(6,6,6)} > > then > grunt> A = LOAD 'data' AS (B: bag {T: tuple(t1:int, t2:int, t3:int)}); > grunt> C = foreach A generate B.t1, B.t2, B.t3; > grunt> dump C; > > {(1)},{(1)},{(1)}) > ({(2),(3)},{(2),(3)},{(2),(3)}) > ({(4),(5),(6)},{(4),(5),(6)},{(4),(5),(6)}) > > > Ashutosh > On Sun, Aug 1, 2010 at 07:48, Rodriguez, John <[EMAIL PROTECTED]> wrote: >> Does this mean there is no way to access the fields t1, t2, t3? >> >> >> >> cat data >> >> {(1,1,1)} >> >> {(2,2,2)(3,3,3)} >> >> {(4,4,4)(5,5,5)(6,6,6)} >> >> A = LOAD 'data' AS (B: bag {T: tuple(t1:int, t2:int, t3:int)}); >> >> >> >> >> >> From: Scott Carey [mailto:[EMAIL PROTECTED]] >> Sent: Saturday, July 31, 2010 9:39 AM >> To: [EMAIL PROTECTED]; Rodriguez, John >> Subject: Re: dereference bag of tuples of fields >> >> >> >> data.isValid >> >> All bags are bags of tuples. The tuple is intrinsic and invisible at >> the syntax level - its visible to udfs though. If you nest one more >> tuple in that nested tuple pig gets confused. So 'bag.field' is >> actually a double dereference - one for the bag and one for the >> intrinsic tuple. >> >> ----- Reply message ----- >> From: "Rodriguez, John" <[EMAIL PROTECTED]> >> Date: Fri, Jul 30, 2010 3:11 pm >> Subject: dereference bag of tuples of fields >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> >> >> I have built a bag tuples where the tuples contain fields. >> >> >> >> I am reading SequenceFiles and have reading MyLoader to do this. I >> created a subset of all the fields, "isValid" to make the example >> simpler. >> >> >> >> I am not sure how to apply a dereference operator to this? >> >> >> >> A = LOAD '/data/NetFlowDigests/rk/DigestMessage/part-r-00000' using >> MyLoader() AS (data: bag{t: tuple(isValid:int)}); >> >> DESCRIBE A; >> >> A: {data: {t: (isValid: int)}} >> >> >> >> So all the ways that I have tried to dereference have syntax errors. >> >> >> >> B = GROUP A BY (data.t); >> >> 2010-07-30 21:51:29,881 [main] ERROR org.apache.pig.tools.grunt.Grunt - >> ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only >> access to the elements of the tuple in the bag is allowed. >> >> >> >> B = GROUP A BY (data.t.isValid); >> >> 2010-07-30 21:54:11,157 [main] ERROR org.apache.pig.tools.grunt.Grunt - >> ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only >> access to the elements of the tuple in the bag is allowed. >> >> >> >> B = GROUP A BY (t.isValid); >> >> 2010-07-30 21:55:31,475 [main] ERROR org.apache.pig.tools.grunt.Grunt - >> ERROR 1000: Error during parsing. Invalid alias: t in {data: {t: >> (isValid: int)}} >> >> >> >> What is the proper way to do this? >> >> >> >> John Rodriguez >> >> >> >> >
|
|