Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> wild card for all fields in a tuple


Copy link to this message
-
wild card for all fields in a tuple
Hi,

Hope there is some simple answer to this. I have bunch of rows, for each
row, I want to add a column which is derived from some existing columns. And
I have large number of columns in my input tuple so I don't want to repeat
the name using "AS" when I generate. Is there an easy way just to append a
column to tuples without having to touch the tuple itself on the output.

Here's my example:

grunt> DESCRIBE X;
X: {id: chararray,v1: int,v2: int}

grunt> DUMP X;
(a,3,42)
(b,2,4)
(c,7,32)

I can do this:
grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, id, v1, v2;
grunt> DUMP Y;
(39,a,3,42)
(2,b,2,4)
(25,c,7,32)

But I would prefer not to have to list all the v's. I may have v1, v2, v3,
..., v100.

Of course this doesn't work

grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, FLATTEN(X);

What can be done to simplify this? And related question, what is the schema
after the FOREACH, I wish I could do a DESCRIBE after FOREACH.

Thanks !!