Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - returning a field base on a function of another field

Copy link to this message
returning a field base on a function of another field
Matthew Purdy 2013-01-30, 20:14
i am trying to use a MAX function of  fieldA of a group and return another
fieldB associated with the record that the function returned; however from
what i have done so far i get the MAX fieldA value along with a list of all
values of the associated fieldB that are in the group.

to express my problem here is a trivial example i have created three files
(test.pig, test.txt, and test.out) which are the pig script the input data,
and the output results)  i have also attached these files for convenience.

it seems logical getting these results back; however, i dont know how to
have pig give me what i want.
given the following input file (nothing important just an example):
(fields are letter, ascii value (first upper than lower), a value)
a    65    1
b    66    2
c    67    3
a    97    10
b    98    20
c    99    30

i would like to return the following
(given the max of the second field (ascii value of lower case), give the

however, i get the following output

my pig script is the following:

letters         = load '$input_path' as (letter:chararray, ascii:chararray,
letter_group    = group letters by letter;
letter_with_max = foreach letter_group generate group, MAX(letters.ascii),
dump letter_with_max;
Thank You,
Matthew Purdy

Matthew Purdy
"Lead, follow, or get out of the way." -- Thomas Paine
"Make everything as simple as possible, but not simpler." -- Albert Einstein
"The definition of insanity is doing the same thing over and over and
expecting a different result." -- Benjamin Franklin
"We can't solve problems by using the same kind of thinking we used when we
created them." -- Albert Einstein