Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig split and join


Copy link to this message
-
Re: Pig split and join
Hi Richipal,

Please try this:

a = LOAD '1.txt' USING PigStorage(',') AS
(id:int,browser:chararray,type:chararray);
b = FOREACH a GENERATE $0, $1;
c = FILTER b by ($1 is not null);
d = FOREACH a GENERATE $0, $2;
e = JOIN c by id, d by id;
f = FOREACH e GENERATE $0, $1, $3;
dump f;

This returns:

(1,firefox,p)
(1,firefox,q)
(1,firefox,r)
(1,firefox,s)
(2,ie,p)
(2,ie,s)
(3,chrome,p)
(3,chrome,r)
(3,chrome,s)
(4,netscape,p)

Thanks,
Cheolsoo

On Thu, Sep 27, 2012 at 1:23 PM, Richipal Singh <[EMAIL PROTECTED]> wrote:

> I have a requirement to propagate field values from one row to another
> given type of record for example my raw input is
>
> 1,firefox,p
> 1,,q
> 1,,r
> 1,,s
> 2,ie,p
> 2,,s
> 3,chrome,p
> 3,,r
> 3,,s
> 4,netscape,p
>
> the desired result
>
> 1,firefox,p
> 1,firefox,q
> 1,firefox,r
> 1,firefox,s
> 2,ie,p
> 2,ie,s
> 3,chrome,p
> 3,chrome,r
> 3,chrome,s
> 4,netscape,p
>
> I tried
>
> A = LOAD 'file1.txt' using PigStorage(',') AS
> (id:int,browser:chararray,type:chararray);
> SPLIT A INTO B IF (type =='p'), C IF (type!='p' );
> joined =  JOIN B BY id FULL, C BY id;
> joinedFields = FOREACH joined GENERATE  B::id,  B::type, B::browser,
> C::id, C::type;
> dump joinedFields;
>
> the result I got was
>
> (,,,1,p  )
> (,,,1,q)
> (,,,1,r)
> (,,,1,s)
> (2,p,ie,2,s)
> (3,p,chrome,3,r)
> (3,p,chrome,3,s)
> (4,p,netscape,,)
>
> Any help is appreciated, Thanks.
>
> --
> Richipal Singh
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB