Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> returning a field base on a function of another field


Copy link to this message
-
returning a field base on a function of another field
i am trying to use a MAX function of  fieldA of a group and return another
fieldB associated with the record that the function returned; however from
what i have done so far i get the MAX fieldA value along with a list of all
values of the associated fieldB that are in the group.

to express my problem here is a trivial example i have created three files
(test.pig, test.txt, and test.out) which are the pig script the input data,
and the output results)  i have also attached these files for convenience.

it seems logical getting these results back; however, i dont know how to
have pig give me what i want.
given the following input file (nothing important just an example):
(fields are letter, ascii value (first upper than lower), a value)
a    65    1
b    66    2
c    67    3
...
a    97    10
b    98    20
c    99    30

i would like to return the following
(given the max of the second field (ascii value of lower case), give the
value)
(a,97,10)
(b,98,20)
(c,99,30)
...

however, i get the following output
(a,97.0,{(1),(10)})
(b,98.0,{(2),(20)})
(c,99.0,{(3),(30)})

my pig script is the following:

letters         = load '$input_path' as (letter:chararray, ascii:chararray,
value:int);
letter_group    = group letters by letter;
letter_with_max = foreach letter_group generate group, MAX(letters.ascii),
letters.value;
dump letter_with_max;
--
Thank You,
Matthew Purdy

------------------------------------------------------------------------------------------------------------------
Matthew Purdy
[EMAIL PROTECTED]
443.848.1595
--------------------------------------
"Lead, follow, or get out of the way." -- Thomas Paine
"Make everything as simple as possible, but not simpler." -- Albert Einstein
"The definition of insanity is doing the same thing over and over and
expecting a different result." -- Benjamin Franklin
"We can't solve problems by using the same kind of thinking we used when we
created them." -- Albert Einstein
------------------------------------------------------------------------------------------------------------------
+
Cheolsoo Park 2013-01-30, 21:07
+
Matthew Purdy 2013-01-30, 22:28
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB