Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> TOBAG function causes trouble and nothing works on the flattened bag.


Copy link to this message
-
TOBAG function causes trouble and nothing works on the flattened bag.
Hi,

After using TOBAG function and then flatten the bag, the relation can't be
used any more except dump or store.

Pig version: 0.8.1-cdh3u5

Input data:
a1 b1 c1
a2 b2 c2
a1 b1 c1
grunt> A = load '/mnt/hgfs/shared/test.txt'  as
(f1:chararray,f2:chararray,f3:chararray);
grunt> describe
A
A: {f1: chararray,f2: chararray,f3: chararray}

grunt> B = foreach A generate TOBAG(*);
grunt> describe B;
B: {{(f1: chararray,f2: chararray,f3: chararray)}}  -- This is wrong, it
should be {(f1: chararray)}
grunt> dump B
...
({(a1),(b1),(c1)})    -- Shows correct result though.
({(a2),(b2),(c2)})
({(a1),(b1),(c1)})
...

grunt> C = foreach B generate flatten($0);
grunt> describe C;
C: {(f1: chararray,f2: chararray,f3: chararray)} -- This is wrong.
grunt> dump C;
(a1)
(b1)
(c1)
(a2)
(b2)
(c2)
(a1)
(b1)
(c1)

-- And from here nothing can be done to C, except to dump as above.

D = foreach C generate $0; -- gives error: java.lang.String cannot be cast
to org.apache.pig.data.Tuple

To illustrate:
---------------------------------------------------------
| A     | f1: bytearray | f2: bytearray | f3: bytearray |
---------------------------------------------------------
|       | a1            | b1            | c1            |
---------------------------------------------------------
---------------------------------------------------------
| A     | f1: chararray | f2: chararray | f3: chararray |
---------------------------------------------------------
|       | a1            | b1            | c1            |
---------------------------------------------------------
--------------------------------------------------------------
| B     | bag({(f1: chararray,f2: chararray,f3: chararray)}) |
--------------------------------------------------------------
|       | {(a1), (b1), (c1)}                                 |
--------------------------------------------------------------
--------------------------------------------------------------
| C     | tuple({f1: chararray,f2: chararray,f3: chararray}) |
--------------------------------------------------------------
|       | a1                                                 |
|       | b1                                                 |
|       | c1                                                 |
--------------------------------------------------------------
However, this is wrong too:
E = foreach C generate flatten($0);
dump E; -- give error.

Could you please help with this?

Thanks,
Shi