Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> JOINing two inputs


Copy link to this message
-
Re: JOINing two inputs
Sorry, I didn't understand you right. I didn't think just use Pig operator
can finish this problem. You can first use cogroup operator to group the two
inputs together. Then apply a UDF to each tuple.

On Mon, Sep 12, 2011 at 5:35 PM, Marek Miglinski <[EMAIL PROTECTED]>wrote:

> Thanks for fast reply ;)
>
> Ok, I've done this:
> recordJoined = JOIN record1 BY (game_id, user_id), record2 BY (game_id,
> user_id);
>
> Now I have:
> record1.epoch::50, record1.game_id::434, record1.user_id::990,
> record2.epoch::67, record2.param1::pop
> record1.epoch::50, record1.game_id::434, record1.user_id::990,
> record2.epoch::43, record2.param1::wow
> record1.epoch::50, record1.game_id::434, record1.user_id::990,
> record2.epoch::42, record2.param1::slow
> record1.epoch::50, record1.game_id::434, record1.user_id::990,
> record2.epoch::23, record2.param1::fast
> (Other data) record1.epoch::67, record1.game_id::564, record1.user_id::889,
> record2.epoch::44, record2.param1::pop
> ...
>
> Now what?
> I can do this:
> recordFiltered = FILTER recordJoined BY record1::epoch >= record2::epoch;
>
> It will give me:
> record1.epoch::50, record1.game_id::434, record1.user_id::990,
> record2.epoch::43, record2.param1::wow
> record1.epoch::50, record1.game_id::434, record1.user_id::990,
> record2.epoch::42, record2.param1::slow
> record1.epoch::50, record1.game_id::434, record1.user_id::990,
> record2.epoch::23, record2.param1::fast
> (Other data) record1.epoch::67, record1.game_id::564, record1.user_id::889,
> record2.epoch::44, record2.param1::pop
> ...
>
> Still not what I want, I need:
> record1.epoch::50, record1.game_id::434, record1.user_id::990,
> record2.epoch::43, record2.param1::wow
> (Other data) record1.epoch::67, record1.game_id::564, record1.user_id::889,
> record2.epoch::44, record2.param1::pop
> ...
>
>
>
> Sincerely,
> Marek M.
>
> ________________________________________
> From: yonghu [[EMAIL PROTECTED]]
> Sent: Monday, September 12, 2011 5:49 PM
> To: [EMAIL PROTECTED]
> Subject: Re: JOINing two inputs
>
> I think you can first use join and then for each tuple using filter.
>
> On Mon, Sep 12, 2011 at 4:19 PM, Marek Miglinski <[EMAIL PROTECTED]
> >wrote:
>
> > Hi,
> >
> > I have a serious task to finish, hope somebody will help me... I have two
> > inputs with data:
> >
> > record1:
> > epoch,
> > game_id,
> > user_id,
> > other data
> >
> > record2:
> > epoch,
> > game_id,
> > user_id,
> > other data
> >
> > Now I need to JOIN record1 with record2 BY game_id, oper_id, user_id,
> > epoch. BUT! epoch in record2 must be FIRST found data and it should be <
> > than epoch in record1.
> >
> > recordJoined = JOIN record1 BY (game_id, user_id), record2 BY (game_id,
> > user_id); + add something like... CLOSEST(WHERE record1::epoch <
> > record2::epoch);
> >
> > So for example:
> >
> > record1:
> > epoch::50
> > game_id::434
> > user_id::990
> >
> > record2:
> > epoch::67
> > game_id::434
> > user_id::990
> > param1::pop
> >
> > record2:
> > epoch::43
> > game_id::434
> > user_id::990
> > param1::wow
> >
> > record2:
> > epoch::42
> > game_id::434
> > user_id::990
> > param1::slow
> >
> > record2:
> > epoch::23
> > game_id::434
> > user_id::990
> > param1::fast
> >
> >
> > The result should be - record1.epoch::50, record1.game_id::434,
> > record1.user_id::990, record2.epoch::43, record2.param1::wow and ...
> >
> > Is it possible to accomplish through PIG? Using JOIN or using FOREACH?
> >
> >
> >
> > Sincerely,
> > Marek M.
> >
> >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB