|
Marek Miglinski
2011-09-12, 14:19
yonghu
2011-09-12, 14:49
Marek Miglinski
2011-09-12, 15:35
yonghu
2011-09-12, 19:20
Marek Miglinski
2011-09-12, 22:23
Xiaomeng Wan
2011-09-13, 21:22
|
-
JOINing two inputsMarek Miglinski 2011-09-12, 14:19
Hi,
I have a serious task to finish, hope somebody will help me... I have two inputs with data: record1: epoch, game_id, user_id, other data record2: epoch, game_id, user_id, other data Now I need to JOIN record1 with record2 BY game_id, oper_id, user_id, epoch. BUT! epoch in record2 must be FIRST found data and it should be < than epoch in record1. recordJoined = JOIN record1 BY (game_id, user_id), record2 BY (game_id, user_id); + add something like... CLOSEST(WHERE record1::epoch < record2::epoch); So for example: record1: epoch::50 game_id::434 user_id::990 record2: epoch::67 game_id::434 user_id::990 param1::pop record2: epoch::43 game_id::434 user_id::990 param1::wow record2: epoch::42 game_id::434 user_id::990 param1::slow record2: epoch::23 game_id::434 user_id::990 param1::fast The result should be - record1.epoch::50, record1.game_id::434, record1.user_id::990, record2.epoch::43, record2.param1::wow and ... Is it possible to accomplish through PIG? Using JOIN or using FOREACH? Sincerely, Marek M.
-
Re: JOINing two inputsyonghu 2011-09-12, 14:49
I think you can first use join and then for each tuple using filter.
On Mon, Sep 12, 2011 at 4:19 PM, Marek Miglinski <[EMAIL PROTECTED]>wrote: > Hi, > > I have a serious task to finish, hope somebody will help me... I have two > inputs with data: > > record1: > epoch, > game_id, > user_id, > other data > > record2: > epoch, > game_id, > user_id, > other data > > Now I need to JOIN record1 with record2 BY game_id, oper_id, user_id, > epoch. BUT! epoch in record2 must be FIRST found data and it should be < > than epoch in record1. > > recordJoined = JOIN record1 BY (game_id, user_id), record2 BY (game_id, > user_id); + add something like... CLOSEST(WHERE record1::epoch < > record2::epoch); > > So for example: > > record1: > epoch::50 > game_id::434 > user_id::990 > > record2: > epoch::67 > game_id::434 > user_id::990 > param1::pop > > record2: > epoch::43 > game_id::434 > user_id::990 > param1::wow > > record2: > epoch::42 > game_id::434 > user_id::990 > param1::slow > > record2: > epoch::23 > game_id::434 > user_id::990 > param1::fast > > > The result should be - record1.epoch::50, record1.game_id::434, > record1.user_id::990, record2.epoch::43, record2.param1::wow and ... > > Is it possible to accomplish through PIG? Using JOIN or using FOREACH? > > > > Sincerely, > Marek M. > > >
-
RE: JOINing two inputsMarek Miglinski 2011-09-12, 15:35
Thanks for fast reply ;)
Ok, I've done this: recordJoined = JOIN record1 BY (game_id, user_id), record2 BY (game_id, user_id); Now I have: record1.epoch::50, record1.game_id::434, record1.user_id::990, record2.epoch::67, record2.param1::pop record1.epoch::50, record1.game_id::434, record1.user_id::990, record2.epoch::43, record2.param1::wow record1.epoch::50, record1.game_id::434, record1.user_id::990, record2.epoch::42, record2.param1::slow record1.epoch::50, record1.game_id::434, record1.user_id::990, record2.epoch::23, record2.param1::fast (Other data) record1.epoch::67, record1.game_id::564, record1.user_id::889, record2.epoch::44, record2.param1::pop ... Now what? I can do this: recordFiltered = FILTER recordJoined BY record1::epoch >= record2::epoch; It will give me: record1.epoch::50, record1.game_id::434, record1.user_id::990, record2.epoch::43, record2.param1::wow record1.epoch::50, record1.game_id::434, record1.user_id::990, record2.epoch::42, record2.param1::slow record1.epoch::50, record1.game_id::434, record1.user_id::990, record2.epoch::23, record2.param1::fast (Other data) record1.epoch::67, record1.game_id::564, record1.user_id::889, record2.epoch::44, record2.param1::pop ... Still not what I want, I need: record1.epoch::50, record1.game_id::434, record1.user_id::990, record2.epoch::43, record2.param1::wow (Other data) record1.epoch::67, record1.game_id::564, record1.user_id::889, record2.epoch::44, record2.param1::pop ... Sincerely, Marek M. ________________________________________ From: yonghu [[EMAIL PROTECTED]] Sent: Monday, September 12, 2011 5:49 PM To: [EMAIL PROTECTED] Subject: Re: JOINing two inputs I think you can first use join and then for each tuple using filter. On Mon, Sep 12, 2011 at 4:19 PM, Marek Miglinski <[EMAIL PROTECTED]>wrote: > Hi, > > I have a serious task to finish, hope somebody will help me... I have two > inputs with data: > > record1: > epoch, > game_id, > user_id, > other data > > record2: > epoch, > game_id, > user_id, > other data > > Now I need to JOIN record1 with record2 BY game_id, oper_id, user_id, > epoch. BUT! epoch in record2 must be FIRST found data and it should be < > than epoch in record1. > > recordJoined = JOIN record1 BY (game_id, user_id), record2 BY (game_id, > user_id); + add something like... CLOSEST(WHERE record1::epoch < > record2::epoch); > > So for example: > > record1: > epoch::50 > game_id::434 > user_id::990 > > record2: > epoch::67 > game_id::434 > user_id::990 > param1::pop > > record2: > epoch::43 > game_id::434 > user_id::990 > param1::wow > > record2: > epoch::42 > game_id::434 > user_id::990 > param1::slow > > record2: > epoch::23 > game_id::434 > user_id::990 > param1::fast > > > The result should be - record1.epoch::50, record1.game_id::434, > record1.user_id::990, record2.epoch::43, record2.param1::wow and ... > > Is it possible to accomplish through PIG? Using JOIN or using FOREACH? > > > > Sincerely, > Marek M. > > >
-
Re: JOINing two inputsyonghu 2011-09-12, 19:20
Sorry, I didn't understand you right. I didn't think just use Pig operator
can finish this problem. You can first use cogroup operator to group the two inputs together. Then apply a UDF to each tuple. On Mon, Sep 12, 2011 at 5:35 PM, Marek Miglinski <[EMAIL PROTECTED]>wrote: > Thanks for fast reply ;) > > Ok, I've done this: > recordJoined = JOIN record1 BY (game_id, user_id), record2 BY (game_id, > user_id); > > Now I have: > record1.epoch::50, record1.game_id::434, record1.user_id::990, > record2.epoch::67, record2.param1::pop > record1.epoch::50, record1.game_id::434, record1.user_id::990, > record2.epoch::43, record2.param1::wow > record1.epoch::50, record1.game_id::434, record1.user_id::990, > record2.epoch::42, record2.param1::slow > record1.epoch::50, record1.game_id::434, record1.user_id::990, > record2.epoch::23, record2.param1::fast > (Other data) record1.epoch::67, record1.game_id::564, record1.user_id::889, > record2.epoch::44, record2.param1::pop > ... > > Now what? > I can do this: > recordFiltered = FILTER recordJoined BY record1::epoch >= record2::epoch; > > It will give me: > record1.epoch::50, record1.game_id::434, record1.user_id::990, > record2.epoch::43, record2.param1::wow > record1.epoch::50, record1.game_id::434, record1.user_id::990, > record2.epoch::42, record2.param1::slow > record1.epoch::50, record1.game_id::434, record1.user_id::990, > record2.epoch::23, record2.param1::fast > (Other data) record1.epoch::67, record1.game_id::564, record1.user_id::889, > record2.epoch::44, record2.param1::pop > ... > > Still not what I want, I need: > record1.epoch::50, record1.game_id::434, record1.user_id::990, > record2.epoch::43, record2.param1::wow > (Other data) record1.epoch::67, record1.game_id::564, record1.user_id::889, > record2.epoch::44, record2.param1::pop > ... > > > > Sincerely, > Marek M. > > ________________________________________ > From: yonghu [[EMAIL PROTECTED]] > Sent: Monday, September 12, 2011 5:49 PM > To: [EMAIL PROTECTED] > Subject: Re: JOINing two inputs > > I think you can first use join and then for each tuple using filter. > > On Mon, Sep 12, 2011 at 4:19 PM, Marek Miglinski <[EMAIL PROTECTED] > >wrote: > > > Hi, > > > > I have a serious task to finish, hope somebody will help me... I have two > > inputs with data: > > > > record1: > > epoch, > > game_id, > > user_id, > > other data > > > > record2: > > epoch, > > game_id, > > user_id, > > other data > > > > Now I need to JOIN record1 with record2 BY game_id, oper_id, user_id, > > epoch. BUT! epoch in record2 must be FIRST found data and it should be < > > than epoch in record1. > > > > recordJoined = JOIN record1 BY (game_id, user_id), record2 BY (game_id, > > user_id); + add something like... CLOSEST(WHERE record1::epoch < > > record2::epoch); > > > > So for example: > > > > record1: > > epoch::50 > > game_id::434 > > user_id::990 > > > > record2: > > epoch::67 > > game_id::434 > > user_id::990 > > param1::pop > > > > record2: > > epoch::43 > > game_id::434 > > user_id::990 > > param1::wow > > > > record2: > > epoch::42 > > game_id::434 > > user_id::990 > > param1::slow > > > > record2: > > epoch::23 > > game_id::434 > > user_id::990 > > param1::fast > > > > > > The result should be - record1.epoch::50, record1.game_id::434, > > record1.user_id::990, record2.epoch::43, record2.param1::wow and ... > > > > Is it possible to accomplish through PIG? Using JOIN or using FOREACH? > > > > > > > > Sincerely, > > Marek M. > > > > > > >
-
RE: JOINing two inputsMarek Miglinski 2011-09-12, 22:23
Any clues anyone? I still didn't get anything myself, thinking...
Sincerely, Marek M. -----Original Message----- From: yonghu [mailto:[EMAIL PROTECTED]] Sent: Monday, September 12, 2011 10:21 PM To: [EMAIL PROTECTED] Subject: Re: JOINing two inputs Sorry, I didn't understand you right. I didn't think just use Pig operator can finish this problem. You can first use cogroup operator to group the two inputs together. Then apply a UDF to each tuple. On Mon, Sep 12, 2011 at 5:35 PM, Marek Miglinski <[EMAIL PROTECTED]>wrote: > Thanks for fast reply ;) > > Ok, I've done this: > recordJoined = JOIN record1 BY (game_id, user_id), record2 BY > (game_id, user_id); > > Now I have: > record1.epoch::50, record1.game_id::434, record1.user_id::990, > record2.epoch::67, record2.param1::pop record1.epoch::50, > record1.game_id::434, record1.user_id::990, record2.epoch::43, > record2.param1::wow record1.epoch::50, record1.game_id::434, > record1.user_id::990, record2.epoch::42, record2.param1::slow > record1.epoch::50, record1.game_id::434, record1.user_id::990, > record2.epoch::23, record2.param1::fast (Other data) > record1.epoch::67, record1.game_id::564, record1.user_id::889, > record2.epoch::44, record2.param1::pop ... > > Now what? > I can do this: > recordFiltered = FILTER recordJoined BY record1::epoch >= > record2::epoch; > > It will give me: > record1.epoch::50, record1.game_id::434, record1.user_id::990, > record2.epoch::43, record2.param1::wow record1.epoch::50, > record1.game_id::434, record1.user_id::990, record2.epoch::42, > record2.param1::slow record1.epoch::50, record1.game_id::434, > record1.user_id::990, record2.epoch::23, record2.param1::fast (Other > data) record1.epoch::67, record1.game_id::564, record1.user_id::889, > record2.epoch::44, record2.param1::pop ... > > Still not what I want, I need: > record1.epoch::50, record1.game_id::434, record1.user_id::990, > record2.epoch::43, record2.param1::wow (Other data) record1.epoch::67, > record1.game_id::564, record1.user_id::889, record2.epoch::44, > record2.param1::pop ... > > > > Sincerely, > Marek M. > > ________________________________________ > From: yonghu [[EMAIL PROTECTED]] > Sent: Monday, September 12, 2011 5:49 PM > To: [EMAIL PROTECTED] > Subject: Re: JOINing two inputs > > I think you can first use join and then for each tuple using filter. > > On Mon, Sep 12, 2011 at 4:19 PM, Marek Miglinski <[EMAIL PROTECTED] > >wrote: > > > Hi, > > > > I have a serious task to finish, hope somebody will help me... I > > have two inputs with data: > > > > record1: > > epoch, > > game_id, > > user_id, > > other data > > > > record2: > > epoch, > > game_id, > > user_id, > > other data > > > > Now I need to JOIN record1 with record2 BY game_id, oper_id, > > user_id, epoch. BUT! epoch in record2 must be FIRST found data and > > it should be < than epoch in record1. > > > > recordJoined = JOIN record1 BY (game_id, user_id), record2 BY > > (game_id, user_id); + add something like... CLOSEST(WHERE > > record1::epoch < record2::epoch); > > > > So for example: > > > > record1: > > epoch::50 > > game_id::434 > > user_id::990 > > > > record2: > > epoch::67 > > game_id::434 > > user_id::990 > > param1::pop > > > > record2: > > epoch::43 > > game_id::434 > > user_id::990 > > param1::wow > > > > record2: > > epoch::42 > > game_id::434 > > user_id::990 > > param1::slow > > > > record2: > > epoch::23 > > game_id::434 > > user_id::990 > > param1::fast > > > > > > The result should be - record1.epoch::50, record1.game_id::434, > > record1.user_id::990, record2.epoch::43, record2.param1::wow and ... > > > > Is it possible to accomplish through PIG? Using JOIN or using FOREACH? > > > > > > > > Sincerely, > > Marek M. > > > > > > >
-
Re: JOINing two inputsXiaomeng Wan 2011-09-13, 21:22
It really depends on what your 'first' means (first by what?)
if you just want a single row, you can group the filter results like this: a = group filtered by (id1, id2); b = foreach a { x = limit filtered 1; generate FLATTEN(x) as (...); } if "first" implies some kind of order, you can change "x=limit filtered 1;" to "y = order filtered by xxx; x = limit y 1; " Shawn On Mon, Sep 12, 2011 at 4:23 PM, Marek Miglinski <[EMAIL PROTECTED]> wrote: > Any clues anyone? I still didn't get anything myself, thinking... > > > > Sincerely, > Marek M. > > -----Original Message----- > From: yonghu [mailto:[EMAIL PROTECTED]] > Sent: Monday, September 12, 2011 10:21 PM > To: [EMAIL PROTECTED] > Subject: Re: JOINing two inputs > > Sorry, I didn't understand you right. I didn't think just use Pig operator can finish this problem. You can first use cogroup operator to group the two inputs together. Then apply a UDF to each tuple. > > On Mon, Sep 12, 2011 at 5:35 PM, Marek Miglinski <[EMAIL PROTECTED]>wrote: > >> Thanks for fast reply ;) >> >> Ok, I've done this: >> recordJoined = JOIN record1 BY (game_id, user_id), record2 BY >> (game_id, user_id); >> >> Now I have: >> record1.epoch::50, record1.game_id::434, record1.user_id::990, >> record2.epoch::67, record2.param1::pop record1.epoch::50, >> record1.game_id::434, record1.user_id::990, record2.epoch::43, >> record2.param1::wow record1.epoch::50, record1.game_id::434, >> record1.user_id::990, record2.epoch::42, record2.param1::slow >> record1.epoch::50, record1.game_id::434, record1.user_id::990, >> record2.epoch::23, record2.param1::fast (Other data) >> record1.epoch::67, record1.game_id::564, record1.user_id::889, >> record2.epoch::44, record2.param1::pop ... >> >> Now what? >> I can do this: >> recordFiltered = FILTER recordJoined BY record1::epoch >>> record2::epoch; >> >> It will give me: >> record1.epoch::50, record1.game_id::434, record1.user_id::990, >> record2.epoch::43, record2.param1::wow record1.epoch::50, >> record1.game_id::434, record1.user_id::990, record2.epoch::42, >> record2.param1::slow record1.epoch::50, record1.game_id::434, >> record1.user_id::990, record2.epoch::23, record2.param1::fast (Other >> data) record1.epoch::67, record1.game_id::564, record1.user_id::889, >> record2.epoch::44, record2.param1::pop ... >> >> Still not what I want, I need: >> record1.epoch::50, record1.game_id::434, record1.user_id::990, >> record2.epoch::43, record2.param1::wow (Other data) record1.epoch::67, >> record1.game_id::564, record1.user_id::889, record2.epoch::44, >> record2.param1::pop ... >> >> >> >> Sincerely, >> Marek M. >> >> ________________________________________ >> From: yonghu [[EMAIL PROTECTED]] >> Sent: Monday, September 12, 2011 5:49 PM >> To: [EMAIL PROTECTED] >> Subject: Re: JOINing two inputs >> >> I think you can first use join and then for each tuple using filter. >> >> On Mon, Sep 12, 2011 at 4:19 PM, Marek Miglinski <[EMAIL PROTECTED] >> >wrote: >> >> > Hi, >> > >> > I have a serious task to finish, hope somebody will help me... I >> > have two inputs with data: >> > >> > record1: >> > epoch, >> > game_id, >> > user_id, >> > other data >> > >> > record2: >> > epoch, >> > game_id, >> > user_id, >> > other data >> > >> > Now I need to JOIN record1 with record2 BY game_id, oper_id, >> > user_id, epoch. BUT! epoch in record2 must be FIRST found data and >> > it should be < than epoch in record1. >> > >> > recordJoined = JOIN record1 BY (game_id, user_id), record2 BY >> > (game_id, user_id); + add something like... CLOSEST(WHERE >> > record1::epoch < record2::epoch); >> > >> > So for example: >> > >> > record1: >> > epoch::50 >> > game_id::434 >> > user_id::990 >> > >> > record2: >> > epoch::67 >> > game_id::434 >> > user_id::990 >> > param1::pop >> > >> > record2: >> > epoch::43 >> > game_id::434 >> > user_id::990 >> > param1::wow >> > >> > record2: >> > epoch::42 >> > game_id::434 >> > user_id::990 >> > param1::slow >> > >> > record2: >> > epoch: |