Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Unique key generation


Copy link to this message
-
Unique key generation
Sarath 2012-04-09, 10:40
Hi All,

I need to generate a unique key for each grouped tuple and then store it
along with each tuple.
For this I have created a UDF which generates a key (current time in
milliseconds appended with a static incrementing sequence number)
I used it in the script as below -

/1.  a = load '1.txt' using PigStorage(',') as (id: chararray, name:
chararray, age: int);
2.  b = load '2.txt' using PigStorage(',') as (id: chararray, name:
chararray, desg: chararray);
3.  c = cogroup a by (ide, name), b by (id, name);
4.  d = filter c by not IsEmpty(a) and not IsEmpty(b);
5.  e = foreach d generate myudf.KeyGenerator(*), *;
6.  dump e;
7.  f = foreach e generate $0, flatten(a);
8.  dump f;
9.  g = foreach e generate $0, flatten(b);
10.dump g;/

At step 6, I could see the unique key generated and printed.
But when it comes to step 8 & 10, the unique key printed is different to
what is generated at step 6 even though I'm carrying the same key to
these steps in the script.

What is going wrong? How can I achieve this requirement?

Regards,
Sarath.