Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> JOINing two inputs


Copy link to this message
-
Re: JOINing two inputs
I think you can first use join and then for each tuple using filter.

On Mon, Sep 12, 2011 at 4:19 PM, Marek Miglinski <[EMAIL PROTECTED]>wrote:

> Hi,
>
> I have a serious task to finish, hope somebody will help me... I have two
> inputs with data:
>
> record1:
> epoch,
> game_id,
> user_id,
> other data
>
> record2:
> epoch,
> game_id,
> user_id,
> other data
>
> Now I need to JOIN record1 with record2 BY game_id, oper_id, user_id,
> epoch. BUT! epoch in record2 must be FIRST found data and it should be <
> than epoch in record1.
>
> recordJoined = JOIN record1 BY (game_id, user_id), record2 BY (game_id,
> user_id); + add something like... CLOSEST(WHERE record1::epoch <
> record2::epoch);
>
> So for example:
>
> record1:
> epoch::50
> game_id::434
> user_id::990
>
> record2:
> epoch::67
> game_id::434
> user_id::990
> param1::pop
>
> record2:
> epoch::43
> game_id::434
> user_id::990
> param1::wow
>
> record2:
> epoch::42
> game_id::434
> user_id::990
> param1::slow
>
> record2:
> epoch::23
> game_id::434
> user_id::990
> param1::fast
>
>
> The result should be - record1.epoch::50, record1.game_id::434,
> record1.user_id::990, record2.epoch::43, record2.param1::wow and ...
>
> Is it possible to accomplish through PIG? Using JOIN or using FOREACH?
>
>
>
> Sincerely,
> Marek M.
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB