|
|
-
TOBAG function causes trouble and nothing works on the flattened bag.
Shi Gao 2013-03-12, 20:55
Hi,
After using TOBAG function and then flatten the bag, the relation can't be used any more except dump or store.
Pig version: 0.8.1-cdh3u5
Input data: a1 b1 c1 a2 b2 c2 a1 b1 c1 grunt> A = load '/mnt/hgfs/shared/test.txt' as (f1:chararray,f2:chararray,f3:chararray); grunt> describe A A: {f1: chararray,f2: chararray,f3: chararray}
grunt> B = foreach A generate TOBAG(*); grunt> describe B; B: {{(f1: chararray,f2: chararray,f3: chararray)}} -- This is wrong, it should be {(f1: chararray)} grunt> dump B ... ({(a1),(b1),(c1)}) -- Shows correct result though. ({(a2),(b2),(c2)}) ({(a1),(b1),(c1)}) ...
grunt> C = foreach B generate flatten($0); grunt> describe C; C: {(f1: chararray,f2: chararray,f3: chararray)} -- This is wrong. grunt> dump C; (a1) (b1) (c1) (a2) (b2) (c2) (a1) (b1) (c1)
-- And from here nothing can be done to C, except to dump as above.
D = foreach C generate $0; -- gives error: java.lang.String cannot be cast to org.apache.pig.data.Tuple
To illustrate: --------------------------------------------------------- | A | f1: bytearray | f2: bytearray | f3: bytearray | --------------------------------------------------------- | | a1 | b1 | c1 | --------------------------------------------------------- --------------------------------------------------------- | A | f1: chararray | f2: chararray | f3: chararray | --------------------------------------------------------- | | a1 | b1 | c1 | --------------------------------------------------------- -------------------------------------------------------------- | B | bag({(f1: chararray,f2: chararray,f3: chararray)}) | -------------------------------------------------------------- | | {(a1), (b1), (c1)} | -------------------------------------------------------------- -------------------------------------------------------------- | C | tuple({f1: chararray,f2: chararray,f3: chararray}) | -------------------------------------------------------------- | | a1 | | | b1 | | | c1 | -------------------------------------------------------------- However, this is wrong too: E = foreach C generate flatten($0); dump E; -- give error.
Could you please help with this?
Thanks, Shi
-
Re: TOBAG function causes trouble and nothing works on the flattened bag.
Johnny Zhang 2013-03-12, 22:29
Hi, Shi: This is working in 0.11. Can you try it?
Johnny Zhang On Tue, Mar 12, 2013 at 1:55 PM, Shi Gao <[EMAIL PROTECTED]> wrote:
> Hi, > > After using TOBAG function and then flatten the bag, the relation can't be > used any more except dump or store. > > Pig version: 0.8.1-cdh3u5 > > Input data: > a1 b1 c1 > a2 b2 c2 > a1 b1 c1 > > > grunt> A = load '/mnt/hgfs/shared/test.txt' as > (f1:chararray,f2:chararray,f3:chararray); > grunt> describe > A > A: {f1: chararray,f2: chararray,f3: chararray} > > grunt> B = foreach A generate TOBAG(*); > grunt> describe B; > B: {{(f1: chararray,f2: chararray,f3: chararray)}} -- This is wrong, it > should be {(f1: chararray)} > grunt> dump B > ... > ({(a1),(b1),(c1)}) -- Shows correct result though. > ({(a2),(b2),(c2)}) > ({(a1),(b1),(c1)}) > ... > > grunt> C = foreach B generate flatten($0); > grunt> describe C; > C: {(f1: chararray,f2: chararray,f3: chararray)} -- This is wrong. > grunt> dump C; > (a1) > (b1) > (c1) > (a2) > (b2) > (c2) > (a1) > (b1) > (c1) > > -- And from here nothing can be done to C, except to dump as above. > > D = foreach C generate $0; -- gives error: java.lang.String cannot be cast > to org.apache.pig.data.Tuple > > To illustrate: > --------------------------------------------------------- > | A | f1: bytearray | f2: bytearray | f3: bytearray | > --------------------------------------------------------- > | | a1 | b1 | c1 | > --------------------------------------------------------- > --------------------------------------------------------- > | A | f1: chararray | f2: chararray | f3: chararray | > --------------------------------------------------------- > | | a1 | b1 | c1 | > --------------------------------------------------------- > -------------------------------------------------------------- > | B | bag({(f1: chararray,f2: chararray,f3: chararray)}) | > -------------------------------------------------------------- > | | {(a1), (b1), (c1)} | > -------------------------------------------------------------- > -------------------------------------------------------------- > | C | tuple({f1: chararray,f2: chararray,f3: chararray}) | > -------------------------------------------------------------- > | | a1 | > | | b1 | > | | c1 | > -------------------------------------------------------------- > > > However, this is wrong too: > E = foreach C generate flatten($0); > dump E; -- give error. > > Could you please help with this? > > Thanks, > Shi >
|
|