|
|
-
returning a field base on a function of another field
Matthew Purdy 2013-01-30, 20:14
i am trying to use a MAX function of fieldA of a group and return another fieldB associated with the record that the function returned; however from what i have done so far i get the MAX fieldA value along with a list of all values of the associated fieldB that are in the group.
to express my problem here is a trivial example i have created three files (test.pig, test.txt, and test.out) which are the pig script the input data, and the output results) i have also attached these files for convenience.
it seems logical getting these results back; however, i dont know how to have pig give me what i want. given the following input file (nothing important just an example): (fields are letter, ascii value (first upper than lower), a value) a 65 1 b 66 2 c 67 3 ... a 97 10 b 98 20 c 99 30
i would like to return the following (given the max of the second field (ascii value of lower case), give the value) (a,97,10) (b,98,20) (c,99,30) ...
however, i get the following output (a,97.0,{(1),(10)}) (b,98.0,{(2),(20)}) (c,99.0,{(3),(30)})
my pig script is the following:
letters = load '$input_path' as (letter:chararray, ascii:chararray, value:int); letter_group = group letters by letter; letter_with_max = foreach letter_group generate group, MAX(letters.ascii), letters.value; dump letter_with_max; -- Thank You, Matthew Purdy
------------------------------------------------------------------------------------------------------------------ Matthew Purdy [EMAIL PROTECTED] 443.848.1595 -------------------------------------- "Lead, follow, or get out of the way." -- Thomas Paine "Make everything as simple as possible, but not simpler." -- Albert Einstein "The definition of insanity is doing the same thing over and over and expecting a different result." -- Benjamin Franklin "We can't solve problems by using the same kind of thinking we used when we created them." -- Albert Einstein ------------------------------------------------------------------------------------------------------------------
-
Re: returning a field base on a function of another field
Cheolsoo Park 2013-01-30, 21:07
Hi Matthew,
Try this:
letters = load '$input_path' as (letter:chararray, ascii, value:int); letter_group = group letters by letter; letter_with_max = foreach letter_group generate group as letter, MAX(letters.ascii) as max; ascii_with_value = foreach letters generate ascii, value; joined = join ascii_with_value by ascii, letter_with_max by max using 'replicated'; results = foreach joined generate letter, max, value; dump results;
Note that I am using replicated join assuming that letter-to-max of ascii is small enough to fit in memory. If that's not true, please remove it.
The result looks like:
(a,97.0,10) (b,98.0,20) (c,99.0,30) (d,100.0,40) (e,101.0,50) (f,102.0,60) (g,103.0,70) (h,104.0,80) (i,105.0,90) (j,106.0,100) (k,107.0,110) (l,108.0,120) (m,109.0,130) (n,110.0,140) (o,111.0,150) (p,112.0,160) (q,113.0,170) (r,114.0,180) (s,115.0,190) (t,116.0,200) (u,117.0,210) (v,118.0,220) (w,119.0,230) (x,120.0,240) (y,121.0,250) (z,122.0,260)
Thanks, Cheolsoo On Wed, Jan 30, 2013 at 12:14 PM, Matthew Purdy < [EMAIL PROTECTED]> wrote:
> i am trying to use a MAX function of fieldA of a group and return another > fieldB associated with the record that the function returned; however from > what i have done so far i get the MAX fieldA value along with a list of all > values of the associated fieldB that are in the group. > > to express my problem here is a trivial example i have created three files > (test.pig, test.txt, and test.out) which are the pig script the input data, > and the output results) i have also attached these files for convenience. > > it seems logical getting these results back; however, i dont know how to > have pig give me what i want. > > > given the following input file (nothing important just an example): > (fields are letter, ascii value (first upper than lower), a value) > a 65 1 > b 66 2 > c 67 3 > ... > a 97 10 > b 98 20 > c 99 30 > > i would like to return the following > (given the max of the second field (ascii value of lower case), give the > value) > (a,97,10) > (b,98,20) > (c,99,30) > ... > > however, i get the following output > (a,97.0,{(1),(10)}) > (b,98.0,{(2),(20)}) > (c,99.0,{(3),(30)}) > > my pig script is the following: > > letters = load '$input_path' as (letter:chararray, > ascii:chararray, value:int); > letter_group = group letters by letter; > letter_with_max = foreach letter_group generate group, MAX(letters.ascii), > letters.value; > dump letter_with_max; > > > > > -- > Thank You, > Matthew Purdy > > > ------------------------------------------------------------------------------------------------------------------ > Matthew Purdy > [EMAIL PROTECTED] > 443.848.1595 > -------------------------------------- > "Lead, follow, or get out of the way." -- Thomas Paine > "Make everything as simple as possible, but not simpler." -- Albert > Einstein > "The definition of insanity is doing the same thing over and over and > expecting a different result." -- Benjamin Franklin > "We can't solve problems by using the same kind of thinking we used when > we created them." -- Albert Einstein > ------------------------------------------------------------------------------------------------------------------ > >
-
Re: returning a field base on a function of another field
Matthew Purdy 2013-01-30, 22:28
thanx; that worked. On Wed, Jan 30, 2013 at 4:07 PM, Cheolsoo Park <[EMAIL PROTECTED]>wrote:
> Hi Matthew, > > Try this: > > letters = load '$input_path' as (letter:chararray, ascii, > value:int); > letter_group = group letters by letter; > letter_with_max = foreach letter_group generate group as letter, > MAX(letters.ascii) as max; > ascii_with_value = foreach letters generate ascii, value; > joined = join ascii_with_value by ascii, letter_with_max by max > using 'replicated'; > results = foreach joined generate letter, max, value; > dump results; > > Note that I am using replicated join assuming that letter-to-max of ascii > is small enough to fit in memory. If that's not true, please remove it. > > The result looks like: > > (a,97.0,10) > (b,98.0,20) > (c,99.0,30) > (d,100.0,40) > (e,101.0,50) > (f,102.0,60) > (g,103.0,70) > (h,104.0,80) > (i,105.0,90) > (j,106.0,100) > (k,107.0,110) > (l,108.0,120) > (m,109.0,130) > (n,110.0,140) > (o,111.0,150) > (p,112.0,160) > (q,113.0,170) > (r,114.0,180) > (s,115.0,190) > (t,116.0,200) > (u,117.0,210) > (v,118.0,220) > (w,119.0,230) > (x,120.0,240) > (y,121.0,250) > (z,122.0,260) > > Thanks, > Cheolsoo > > > On Wed, Jan 30, 2013 at 12:14 PM, Matthew Purdy < > [EMAIL PROTECTED]> wrote: > > > i am trying to use a MAX function of fieldA of a group and return > another > > fieldB associated with the record that the function returned; however > from > > what i have done so far i get the MAX fieldA value along with a list of > all > > values of the associated fieldB that are in the group. > > > > to express my problem here is a trivial example i have created three > files > > (test.pig, test.txt, and test.out) which are the pig script the input > data, > > and the output results) i have also attached these files for > convenience. > > > > it seems logical getting these results back; however, i dont know how to > > have pig give me what i want. > > > > > > given the following input file (nothing important just an example): > > (fields are letter, ascii value (first upper than lower), a value) > > a 65 1 > > b 66 2 > > c 67 3 > > ... > > a 97 10 > > b 98 20 > > c 99 30 > > > > i would like to return the following > > (given the max of the second field (ascii value of lower case), give the > > value) > > (a,97,10) > > (b,98,20) > > (c,99,30) > > ... > > > > however, i get the following output > > (a,97.0,{(1),(10)}) > > (b,98.0,{(2),(20)}) > > (c,99.0,{(3),(30)}) > > > > my pig script is the following: > > > > letters = load '$input_path' as (letter:chararray, > > ascii:chararray, value:int); > > letter_group = group letters by letter; > > letter_with_max = foreach letter_group generate group, > MAX(letters.ascii), > > letters.value; > > dump letter_with_max; > > > > > > > > > > -- > > Thank You, > > Matthew Purdy > > > > > > > ------------------------------------------------------------------------------------------------------------------ > > Matthew Purdy > > [EMAIL PROTECTED] > > 443.848.1595 > > -------------------------------------- > > "Lead, follow, or get out of the way." -- Thomas Paine > > "Make everything as simple as possible, but not simpler." -- Albert > > Einstein > > "The definition of insanity is doing the same thing over and over and > > expecting a different result." -- Benjamin Franklin > > "We can't solve problems by using the same kind of thinking we used when > > we created them." -- Albert Einstein > > > ------------------------------------------------------------------------------------------------------------------ > > > > >
-- Thank You, Matthew Purdy
------------------------------------------------------------------------------------------------------------------ Matthew Purdy [EMAIL PROTECTED] 443.848.1595 -------------------------------------- "Lead, follow, or get out of the way." -- Thomas Paine "Make everything as simple as possible, but not simpler." -- Albert Einstein "The definition of insanity is doing the same thing over and over and expecting a different result." -- Benjamin Franklin "We can't solve problems by using the same kind of thinking we used when we created them." -- Albert Einstein
|
|