Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> TOBAG function causes trouble and nothing works on the flattened bag.


Copy link to this message
-
Re: TOBAG function causes trouble and nothing works on the flattened bag.
Hi, Shi:
This is working in 0.11. Can you try it?

Johnny Zhang
On Tue, Mar 12, 2013 at 1:55 PM, Shi Gao <[EMAIL PROTECTED]> wrote:

> Hi,
>
> After using TOBAG function and then flatten the bag, the relation can't be
> used any more except dump or store.
>
> Pig version: 0.8.1-cdh3u5
>
> Input data:
> a1 b1 c1
> a2 b2 c2
> a1 b1 c1
>
>
> grunt> A = load '/mnt/hgfs/shared/test.txt'  as
> (f1:chararray,f2:chararray,f3:chararray);
> grunt> describe
> A
> A: {f1: chararray,f2: chararray,f3: chararray}
>
> grunt> B = foreach A generate TOBAG(*);
> grunt> describe B;
> B: {{(f1: chararray,f2: chararray,f3: chararray)}}  -- This is wrong, it
> should be {(f1: chararray)}
> grunt> dump B
> ...
> ({(a1),(b1),(c1)})    -- Shows correct result though.
> ({(a2),(b2),(c2)})
> ({(a1),(b1),(c1)})
> ...
>
> grunt> C = foreach B generate flatten($0);
> grunt> describe C;
> C: {(f1: chararray,f2: chararray,f3: chararray)} -- This is wrong.
> grunt> dump C;
> (a1)
> (b1)
> (c1)
> (a2)
> (b2)
> (c2)
> (a1)
> (b1)
> (c1)
>
> -- And from here nothing can be done to C, except to dump as above.
>
> D = foreach C generate $0; -- gives error: java.lang.String cannot be cast
> to org.apache.pig.data.Tuple
>
> To illustrate:
> ---------------------------------------------------------
> | A     | f1: bytearray | f2: bytearray | f3: bytearray |
> ---------------------------------------------------------
> |       | a1            | b1            | c1            |
> ---------------------------------------------------------
> ---------------------------------------------------------
> | A     | f1: chararray | f2: chararray | f3: chararray |
> ---------------------------------------------------------
> |       | a1            | b1            | c1            |
> ---------------------------------------------------------
> --------------------------------------------------------------
> | B     | bag({(f1: chararray,f2: chararray,f3: chararray)}) |
> --------------------------------------------------------------
> |       | {(a1), (b1), (c1)}                                 |
> --------------------------------------------------------------
> --------------------------------------------------------------
> | C     | tuple({f1: chararray,f2: chararray,f3: chararray}) |
> --------------------------------------------------------------
> |       | a1                                                 |
> |       | b1                                                 |
> |       | c1                                                 |
> --------------------------------------------------------------
>
>
> However, this is wrong too:
> E = foreach C generate flatten($0);
> dump E; -- give error.
>
> Could you please help with this?
>
> Thanks,
> Shi
>