Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> wild card for all fields in a tuple


Copy link to this message
-
Re: wild card for all fields in a tuple
Yeah, that works great. Thanks Jonathan and Alan. I can see that all fields
in between feature will be totally useful for some cases.

On Wed, Jan 12, 2011 at 3:33 PM, Alan Gates <[EMAIL PROTECTED]> wrote:

> Jonathan is right, you can do all fields in a tuple with *.  I was thinking
> of doing all fields in between two fields, which you can't do yet.
>
> Alan.
>
>
> On Jan 12, 2011, at 3:18 PM, Alan Gates wrote:
>
>  There isn't a way to do that yet.  See
>> https://issues.apache.org/jira/browse/PIG-1693
>>  for our plans on adding it in the next release.
>>
>> Alan.
>>
>> On Jan 12, 2011, at 2:51 PM, Dexin Wang wrote:
>>
>>  Hi,
>>>
>>> Hope there is some simple answer to this. I have bunch of rows, for
>>> each
>>> row, I want to add a column which is derived from some existing
>>> columns. And
>>> I have large number of columns in my input tuple so I don't want to
>>> repeat
>>> the name using "AS" when I generate. Is there an easy way just to
>>> append a
>>> column to tuples without having to touch the tuple itself on the
>>> output.
>>>
>>> Here's my example:
>>>
>>> grunt> DESCRIBE X;
>>> X: {id: chararray,v1: int,v2: int}
>>>
>>> grunt> DUMP X;
>>> (a,3,42)
>>> (b,2,4)
>>> (c,7,32)
>>>
>>> I can do this:
>>> grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, id, v1, v2;
>>> grunt> DUMP Y;
>>> (39,a,3,42)
>>> (2,b,2,4)
>>> (25,c,7,32)
>>>
>>> But I would prefer not to have to list all the v's. I may have v1,
>>> v2, v3,
>>> ..., v100.
>>>
>>> Of course this doesn't work
>>>
>>> grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, FLATTEN(X);
>>>
>>> What can be done to simplify this? And related question, what is the
>>> schema
>>> after the FOREACH, I wish I could do a DESCRIBE after FOREACH.
>>>
>>> Thanks !!
>>>
>>
>>
>