-Unique key generation
Sarath 2012-04-09, 10:40
I need to generate a unique key for each grouped tuple and then store it
along with each tuple.
For this I have created a UDF which generates a key (current time in
milliseconds appended with a static incrementing sequence number)
I used it in the script as below -
/1. a = load '1.txt' using PigStorage(',') as (id: chararray, name:
chararray, age: int);
2. b = load '2.txt' using PigStorage(',') as (id: chararray, name:
chararray, desg: chararray);
3. c = cogroup a by (ide, name), b by (id, name);
4. d = filter c by not IsEmpty(a) and not IsEmpty(b);
5. e = foreach d generate myudf.KeyGenerator(*), *;
6. dump e;
7. f = foreach e generate $0, flatten(a);
8. dump f;
9. g = foreach e generate $0, flatten(b);
At step 6, I could see the unique key generated and printed.
But when it comes to step 8 & 10, the unique key printed is different to
what is generated at step 6 even though I'm carrying the same key to
these steps in the script.
What is going wrong? How can I achieve this requirement?