Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - JOINing two inputs


Copy link to this message
-
Re: JOINing two inputs
yonghu 2011-09-12, 14:49
I think you can first use join and then for each tuple using filter.

On Mon, Sep 12, 2011 at 4:19 PM, Marek Miglinski <[EMAIL PROTECTED]>wrote:

> Hi,
>
> I have a serious task to finish, hope somebody will help me... I have two
> inputs with data:
>
> record1:
> epoch,
> game_id,
> user_id,
> other data
>
> record2:
> epoch,
> game_id,
> user_id,
> other data
>
> Now I need to JOIN record1 with record2 BY game_id, oper_id, user_id,
> epoch. BUT! epoch in record2 must be FIRST found data and it should be <
> than epoch in record1.
>
> recordJoined = JOIN record1 BY (game_id, user_id), record2 BY (game_id,
> user_id); + add something like... CLOSEST(WHERE record1::epoch <
> record2::epoch);
>
> So for example:
>
> record1:
> epoch::50
> game_id::434
> user_id::990
>
> record2:
> epoch::67
> game_id::434
> user_id::990
> param1::pop
>
> record2:
> epoch::43
> game_id::434
> user_id::990
> param1::wow
>
> record2:
> epoch::42
> game_id::434
> user_id::990
> param1::slow
>
> record2:
> epoch::23
> game_id::434
> user_id::990
> param1::fast
>
>
> The result should be - record1.epoch::50, record1.game_id::434,
> record1.user_id::990, record2.epoch::43, record2.param1::wow and ...
>
> Is it possible to accomplish through PIG? Using JOIN or using FOREACH?
>
>
>
> Sincerely,
> Marek M.
>
>
>