Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> wild card for all fields in a tuple


Copy link to this message
-
wild card for all fields in a tuple
Hi,

Hope there is some simple answer to this. I have bunch of rows, for each
row, I want to add a column which is derived from some existing columns. And
I have large number of columns in my input tuple so I don't want to repeat
the name using "AS" when I generate. Is there an easy way just to append a
column to tuples without having to touch the tuple itself on the output.

Here's my example:

grunt> DESCRIBE X;
X: {id: chararray,v1: int,v2: int}

grunt> DUMP X;
(a,3,42)
(b,2,4)
(c,7,32)

I can do this:
grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, id, v1, v2;
grunt> DUMP Y;
(39,a,3,42)
(2,b,2,4)
(25,c,7,32)

But I would prefer not to have to list all the v's. I may have v1, v2, v3,
..., v100.

Of course this doesn't work

grunt> Y = FOREACH X GENERATE (v2 - v1) as diff, FLATTEN(X);

What can be done to simplify this? And related question, what is the schema
after the FOREACH, I wish I could do a DESCRIBE after FOREACH.

Thanks !!
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB