|
|
-
wild card for all fields in a tuple
Dexin Wang 2011-01-12, 22:51
Hi,
Hope there is some simple answer to this. I have bunch of rows, for each row, I want to add a column which is derived from some existing columns. And I have large number of columns in my input tuple so I don't want to repeat the name using "AS" when I generate. Is there an easy way just to append a column to tuples without having to touch the tuple itself on the output.
Here's my example:
grunt> DESCRIBE X; X: {id: chararray,v1: int,v2: int}
grunt> DUMP X; (a,3,42) (b,2,4) (c,7,32)
I can do this: grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, id, v1, v2; grunt> DUMP Y; (39,a,3,42) (2,b,2,4) (25,c,7,32)
But I would prefer not to have to list all the v's. I may have v1, v2, v3, ..., v100.
Of course this doesn't work
grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, FLATTEN(X);
What can be done to simplify this? And related question, what is the schema after the FOREACH, I wish I could do a DESCRIBE after FOREACH.
Thanks !!
-
Re: wild card for all fields in a tuple
Jonathan Coveney 2011-01-12, 23:16
Foreach a generate function(thing), *; should do what yopu want. * just throws on all the columns
Sent via BlackBerry
-----Original Message----- From: Dexin Wang <[EMAIL PROTECTED]> Date: Wed, 12 Jan 2011 14:51:58 To: <[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] Subject: wild card for all fields in a tuple
Hi,
Hope there is some simple answer to this. I have bunch of rows, for each row, I want to add a column which is derived from some existing columns. And I have large number of columns in my input tuple so I don't want to repeat the name using "AS" when I generate. Is there an easy way just to append a column to tuples without having to touch the tuple itself on the output.
Here's my example:
grunt> DESCRIBE X; X: {id: chararray,v1: int,v2: int}
grunt> DUMP X; (a,3,42) (b,2,4) (c,7,32)
I can do this: grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, id, v1, v2; grunt> DUMP Y; (39,a,3,42) (2,b,2,4) (25,c,7,32)
But I would prefer not to have to list all the v's. I may have v1, v2, v3, ..., v100.
Of course this doesn't work
grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, FLATTEN(X);
What can be done to simplify this? And related question, what is the schema after the FOREACH, I wish I could do a DESCRIBE after FOREACH.
Thanks !!
-
Re: wild card for all fields in a tuple
Alan Gates 2011-01-12, 23:18
There isn't a way to do that yet. See https://issues.apache.org/jira/browse/PIG-1693 for our plans on adding it in the next release. Alan. On Jan 12, 2011, at 2:51 PM, Dexin Wang wrote: > Hi, > > Hope there is some simple answer to this. I have bunch of rows, for > each > row, I want to add a column which is derived from some existing > columns. And > I have large number of columns in my input tuple so I don't want to > repeat > the name using "AS" when I generate. Is there an easy way just to > append a > column to tuples without having to touch the tuple itself on the > output. > > Here's my example: > > grunt> DESCRIBE X; > X: {id: chararray,v1: int,v2: int} > > grunt> DUMP X; > (a,3,42) > (b,2,4) > (c,7,32) > > I can do this: > grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, id, v1, v2; > grunt> DUMP Y; > (39,a,3,42) > (2,b,2,4) > (25,c,7,32) > > But I would prefer not to have to list all the v's. I may have v1, > v2, v3, > ..., v100. > > Of course this doesn't work > > grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, FLATTEN(X); > > What can be done to simplify this? And related question, what is the > schema > after the FOREACH, I wish I could do a DESCRIBE after FOREACH. > > Thanks !!
-
Re: wild card for all fields in a tuple
Alan Gates 2011-01-12, 23:33
Jonathan is right, you can do all fields in a tuple with *. I was thinking of doing all fields in between two fields, which you can't do yet. Alan. On Jan 12, 2011, at 3:18 PM, Alan Gates wrote: > There isn't a way to do that yet. See https://issues.apache.org/jira/browse/PIG-1693> for our plans on adding it in the next release. > > Alan. > > On Jan 12, 2011, at 2:51 PM, Dexin Wang wrote: > >> Hi, >> >> Hope there is some simple answer to this. I have bunch of rows, for >> each >> row, I want to add a column which is derived from some existing >> columns. And >> I have large number of columns in my input tuple so I don't want to >> repeat >> the name using "AS" when I generate. Is there an easy way just to >> append a >> column to tuples without having to touch the tuple itself on the >> output. >> >> Here's my example: >> >> grunt> DESCRIBE X; >> X: {id: chararray,v1: int,v2: int} >> >> grunt> DUMP X; >> (a,3,42) >> (b,2,4) >> (c,7,32) >> >> I can do this: >> grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, id, v1, v2; >> grunt> DUMP Y; >> (39,a,3,42) >> (2,b,2,4) >> (25,c,7,32) >> >> But I would prefer not to have to list all the v's. I may have v1, >> v2, v3, >> ..., v100. >> >> Of course this doesn't work >> >> grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, FLATTEN(X); >> >> What can be done to simplify this? And related question, what is the >> schema >> after the FOREACH, I wish I could do a DESCRIBE after FOREACH. >> >> Thanks !! >
-
Re: wild card for all fields in a tuple
Dexin Wang 2011-01-12, 23:44
Yeah, that works great. Thanks Jonathan and Alan. I can see that all fields in between feature will be totally useful for some cases. On Wed, Jan 12, 2011 at 3:33 PM, Alan Gates <[EMAIL PROTECTED]> wrote: > Jonathan is right, you can do all fields in a tuple with *. I was thinking > of doing all fields in between two fields, which you can't do yet. > > Alan. > > > On Jan 12, 2011, at 3:18 PM, Alan Gates wrote: > > There isn't a way to do that yet. See >> https://issues.apache.org/jira/browse/PIG-1693>> for our plans on adding it in the next release. >> >> Alan. >> >> On Jan 12, 2011, at 2:51 PM, Dexin Wang wrote: >> >> Hi, >>> >>> Hope there is some simple answer to this. I have bunch of rows, for >>> each >>> row, I want to add a column which is derived from some existing >>> columns. And >>> I have large number of columns in my input tuple so I don't want to >>> repeat >>> the name using "AS" when I generate. Is there an easy way just to >>> append a >>> column to tuples without having to touch the tuple itself on the >>> output. >>> >>> Here's my example: >>> >>> grunt> DESCRIBE X; >>> X: {id: chararray,v1: int,v2: int} >>> >>> grunt> DUMP X; >>> (a,3,42) >>> (b,2,4) >>> (c,7,32) >>> >>> I can do this: >>> grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, id, v1, v2; >>> grunt> DUMP Y; >>> (39,a,3,42) >>> (2,b,2,4) >>> (25,c,7,32) >>> >>> But I would prefer not to have to list all the v's. I may have v1, >>> v2, v3, >>> ..., v100. >>> >>> Of course this doesn't work >>> >>> grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, FLATTEN(X); >>> >>> What can be done to simplify this? And related question, what is the >>> schema >>> after the FOREACH, I wish I could do a DESCRIBE after FOREACH. >>> >>> Thanks !! >>> >> >> >
|
|