Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig split and join


Copy link to this message
-
Re: Pig split and join
Hi Richipal,

Please try this:

a = LOAD '1.txt' USING PigStorage(',') AS
(id:int,browser:chararray,type:chararray);
b = FOREACH a GENERATE $0, $1;
c = FILTER b by ($1 is not null);
d = FOREACH a GENERATE $0, $2;
e = JOIN c by id, d by id;
f = FOREACH e GENERATE $0, $1, $3;
dump f;

This returns:

(1,firefox,p)
(1,firefox,q)
(1,firefox,r)
(1,firefox,s)
(2,ie,p)
(2,ie,s)
(3,chrome,p)
(3,chrome,r)
(3,chrome,s)
(4,netscape,p)

Thanks,
Cheolsoo

On Thu, Sep 27, 2012 at 1:23 PM, Richipal Singh <[EMAIL PROTECTED]> wrote:

> I have a requirement to propagate field values from one row to another
> given type of record for example my raw input is
>
> 1,firefox,p
> 1,,q
> 1,,r
> 1,,s
> 2,ie,p
> 2,,s
> 3,chrome,p
> 3,,r
> 3,,s
> 4,netscape,p
>
> the desired result
>
> 1,firefox,p
> 1,firefox,q
> 1,firefox,r
> 1,firefox,s
> 2,ie,p
> 2,ie,s
> 3,chrome,p
> 3,chrome,r
> 3,chrome,s
> 4,netscape,p
>
> I tried
>
> A = LOAD 'file1.txt' using PigStorage(',') AS
> (id:int,browser:chararray,type:chararray);
> SPLIT A INTO B IF (type =='p'), C IF (type!='p' );
> joined =  JOIN B BY id FULL, C BY id;
> joinedFields = FOREACH joined GENERATE  B::id,  B::type, B::browser,
> C::id, C::type;
> dump joinedFields;
>
> the result I got was
>
> (,,,1,p  )
> (,,,1,q)
> (,,,1,r)
> (,,,1,s)
> (2,p,ie,2,s)
> (3,p,chrome,3,r)
> (3,p,chrome,3,s)
> (4,p,netscape,,)
>
> Any help is appreciated, Thanks.
>
> --
> Richipal Singh
>