Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - TOBAG function causes trouble and nothing works on the flattened bag.


Copy link to this message
-
TOBAG function causes trouble and nothing works on the flattened bag.
Shi Gao 2013-03-12, 20:55
Hi,

After using TOBAG function and then flatten the bag, the relation can't be
used any more except dump or store.

Pig version: 0.8.1-cdh3u5

Input data:
a1 b1 c1
a2 b2 c2
a1 b1 c1
grunt> A = load '/mnt/hgfs/shared/test.txt'  as
(f1:chararray,f2:chararray,f3:chararray);
grunt> describe
A
A: {f1: chararray,f2: chararray,f3: chararray}

grunt> B = foreach A generate TOBAG(*);
grunt> describe B;
B: {{(f1: chararray,f2: chararray,f3: chararray)}}  -- This is wrong, it
should be {(f1: chararray)}
grunt> dump B
...
({(a1),(b1),(c1)})    -- Shows correct result though.
({(a2),(b2),(c2)})
({(a1),(b1),(c1)})
...

grunt> C = foreach B generate flatten($0);
grunt> describe C;
C: {(f1: chararray,f2: chararray,f3: chararray)} -- This is wrong.
grunt> dump C;
(a1)
(b1)
(c1)
(a2)
(b2)
(c2)
(a1)
(b1)
(c1)

-- And from here nothing can be done to C, except to dump as above.

D = foreach C generate $0; -- gives error: java.lang.String cannot be cast
to org.apache.pig.data.Tuple

To illustrate:
---------------------------------------------------------
| A     | f1: bytearray | f2: bytearray | f3: bytearray |
---------------------------------------------------------
|       | a1            | b1            | c1            |
---------------------------------------------------------
---------------------------------------------------------
| A     | f1: chararray | f2: chararray | f3: chararray |
---------------------------------------------------------
|       | a1            | b1            | c1            |
---------------------------------------------------------
--------------------------------------------------------------
| B     | bag({(f1: chararray,f2: chararray,f3: chararray)}) |
--------------------------------------------------------------
|       | {(a1), (b1), (c1)}                                 |
--------------------------------------------------------------
--------------------------------------------------------------
| C     | tuple({f1: chararray,f2: chararray,f3: chararray}) |
--------------------------------------------------------------
|       | a1                                                 |
|       | b1                                                 |
|       | c1                                                 |
--------------------------------------------------------------
However, this is wrong too:
E = foreach C generate flatten($0);
dump E; -- give error.

Could you please help with this?

Thanks,
Shi