Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> TOBAG function causes trouble and nothing works on the flattened bag.


Copy link to this message
-
Re: TOBAG function causes trouble and nothing works on the flattened bag.
Hi, Shi:
This is working in 0.11. Can you try it?

Johnny Zhang
On Tue, Mar 12, 2013 at 1:55 PM, Shi Gao <[EMAIL PROTECTED]> wrote:

> Hi,
>
> After using TOBAG function and then flatten the bag, the relation can't be
> used any more except dump or store.
>
> Pig version: 0.8.1-cdh3u5
>
> Input data:
> a1 b1 c1
> a2 b2 c2
> a1 b1 c1
>
>
> grunt> A = load '/mnt/hgfs/shared/test.txt'  as
> (f1:chararray,f2:chararray,f3:chararray);
> grunt> describe
> A
> A: {f1: chararray,f2: chararray,f3: chararray}
>
> grunt> B = foreach A generate TOBAG(*);
> grunt> describe B;
> B: {{(f1: chararray,f2: chararray,f3: chararray)}}  -- This is wrong, it
> should be {(f1: chararray)}
> grunt> dump B
> ...
> ({(a1),(b1),(c1)})    -- Shows correct result though.
> ({(a2),(b2),(c2)})
> ({(a1),(b1),(c1)})
> ...
>
> grunt> C = foreach B generate flatten($0);
> grunt> describe C;
> C: {(f1: chararray,f2: chararray,f3: chararray)} -- This is wrong.
> grunt> dump C;
> (a1)
> (b1)
> (c1)
> (a2)
> (b2)
> (c2)
> (a1)
> (b1)
> (c1)
>
> -- And from here nothing can be done to C, except to dump as above.
>
> D = foreach C generate $0; -- gives error: java.lang.String cannot be cast
> to org.apache.pig.data.Tuple
>
> To illustrate:
> ---------------------------------------------------------
> | A     | f1: bytearray | f2: bytearray | f3: bytearray |
> ---------------------------------------------------------
> |       | a1            | b1            | c1            |
> ---------------------------------------------------------
> ---------------------------------------------------------
> | A     | f1: chararray | f2: chararray | f3: chararray |
> ---------------------------------------------------------
> |       | a1            | b1            | c1            |
> ---------------------------------------------------------
> --------------------------------------------------------------
> | B     | bag({(f1: chararray,f2: chararray,f3: chararray)}) |
> --------------------------------------------------------------
> |       | {(a1), (b1), (c1)}                                 |
> --------------------------------------------------------------
> --------------------------------------------------------------
> | C     | tuple({f1: chararray,f2: chararray,f3: chararray}) |
> --------------------------------------------------------------
> |       | a1                                                 |
> |       | b1                                                 |
> |       | c1                                                 |
> --------------------------------------------------------------
>
>
> However, this is wrong too:
> E = foreach C generate flatten($0);
> dump E; -- give error.
>
> Could you please help with this?
>
> Thanks,
> Shi
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB