Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - returning a field base on a function of another field


+
Matthew Purdy 2013-01-30, 20:14
Copy link to this message
-
Re: returning a field base on a function of another field
Cheolsoo Park 2013-01-30, 21:07
Hi Matthew,

Try this:

letters          = load '$input_path' as (letter:chararray, ascii,
value:int);
letter_group     = group letters by letter;
letter_with_max  = foreach letter_group generate group as letter,
MAX(letters.ascii) as max;
ascii_with_value = foreach letters generate ascii, value;
joined           = join ascii_with_value by ascii, letter_with_max by max
using 'replicated';
results          = foreach joined generate letter, max, value;
dump results;

Note that I am using replicated join assuming that letter-to-max of ascii
is small enough to fit in memory. If that's not true, please remove it.

The result looks like:

(a,97.0,10)
(b,98.0,20)
(c,99.0,30)
(d,100.0,40)
(e,101.0,50)
(f,102.0,60)
(g,103.0,70)
(h,104.0,80)
(i,105.0,90)
(j,106.0,100)
(k,107.0,110)
(l,108.0,120)
(m,109.0,130)
(n,110.0,140)
(o,111.0,150)
(p,112.0,160)
(q,113.0,170)
(r,114.0,180)
(s,115.0,190)
(t,116.0,200)
(u,117.0,210)
(v,118.0,220)
(w,119.0,230)
(x,120.0,240)
(y,121.0,250)
(z,122.0,260)

Thanks,
Cheolsoo
On Wed, Jan 30, 2013 at 12:14 PM, Matthew Purdy <
[EMAIL PROTECTED]> wrote:

> i am trying to use a MAX function of  fieldA of a group and return another
> fieldB associated with the record that the function returned; however from
> what i have done so far i get the MAX fieldA value along with a list of all
> values of the associated fieldB that are in the group.
>
> to express my problem here is a trivial example i have created three files
> (test.pig, test.txt, and test.out) which are the pig script the input data,
> and the output results)  i have also attached these files for convenience.
>
> it seems logical getting these results back; however, i dont know how to
> have pig give me what i want.
>
>
> given the following input file (nothing important just an example):
> (fields are letter, ascii value (first upper than lower), a value)
> a    65    1
> b    66    2
> c    67    3
> ...
> a    97    10
> b    98    20
> c    99    30
>
> i would like to return the following
> (given the max of the second field (ascii value of lower case), give the
> value)
> (a,97,10)
> (b,98,20)
> (c,99,30)
> ...
>
> however, i get the following output
> (a,97.0,{(1),(10)})
> (b,98.0,{(2),(20)})
> (c,99.0,{(3),(30)})
>
> my pig script is the following:
>
> letters         = load '$input_path' as (letter:chararray,
> ascii:chararray, value:int);
> letter_group    = group letters by letter;
> letter_with_max = foreach letter_group generate group, MAX(letters.ascii),
> letters.value;
> dump letter_with_max;
>
>
>
>
> --
> Thank You,
> Matthew Purdy
>
>
> ------------------------------------------------------------------------------------------------------------------
> Matthew Purdy
> [EMAIL PROTECTED]
> 443.848.1595
> --------------------------------------
> "Lead, follow, or get out of the way." -- Thomas Paine
> "Make everything as simple as possible, but not simpler." -- Albert
> Einstein
> "The definition of insanity is doing the same thing over and over and
> expecting a different result." -- Benjamin Franklin
> "We can't solve problems by using the same kind of thinking we used when
> we created them." -- Albert Einstein
> ------------------------------------------------------------------------------------------------------------------
>
>
+
Matthew Purdy 2013-01-30, 22:28