Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> returning a field base on a function of another field


+
Matthew Purdy 2013-01-30, 20:14
+
Cheolsoo Park 2013-01-30, 21:07
Copy link to this message
-
Re: returning a field base on a function of another field
thanx; that worked.
On Wed, Jan 30, 2013 at 4:07 PM, Cheolsoo Park <[EMAIL PROTECTED]>wrote:

> Hi Matthew,
>
> Try this:
>
> letters          = load '$input_path' as (letter:chararray, ascii,
> value:int);
> letter_group     = group letters by letter;
> letter_with_max  = foreach letter_group generate group as letter,
> MAX(letters.ascii) as max;
> ascii_with_value = foreach letters generate ascii, value;
> joined           = join ascii_with_value by ascii, letter_with_max by max
> using 'replicated';
> results          = foreach joined generate letter, max, value;
> dump results;
>
> Note that I am using replicated join assuming that letter-to-max of ascii
> is small enough to fit in memory. If that's not true, please remove it.
>
> The result looks like:
>
> (a,97.0,10)
> (b,98.0,20)
> (c,99.0,30)
> (d,100.0,40)
> (e,101.0,50)
> (f,102.0,60)
> (g,103.0,70)
> (h,104.0,80)
> (i,105.0,90)
> (j,106.0,100)
> (k,107.0,110)
> (l,108.0,120)
> (m,109.0,130)
> (n,110.0,140)
> (o,111.0,150)
> (p,112.0,160)
> (q,113.0,170)
> (r,114.0,180)
> (s,115.0,190)
> (t,116.0,200)
> (u,117.0,210)
> (v,118.0,220)
> (w,119.0,230)
> (x,120.0,240)
> (y,121.0,250)
> (z,122.0,260)
>
> Thanks,
> Cheolsoo
>
>
> On Wed, Jan 30, 2013 at 12:14 PM, Matthew Purdy <
> [EMAIL PROTECTED]> wrote:
>
> > i am trying to use a MAX function of  fieldA of a group and return
> another
> > fieldB associated with the record that the function returned; however
> from
> > what i have done so far i get the MAX fieldA value along with a list of
> all
> > values of the associated fieldB that are in the group.
> >
> > to express my problem here is a trivial example i have created three
> files
> > (test.pig, test.txt, and test.out) which are the pig script the input
> data,
> > and the output results)  i have also attached these files for
> convenience.
> >
> > it seems logical getting these results back; however, i dont know how to
> > have pig give me what i want.
> >
> >
> > given the following input file (nothing important just an example):
> > (fields are letter, ascii value (first upper than lower), a value)
> > a    65    1
> > b    66    2
> > c    67    3
> > ...
> > a    97    10
> > b    98    20
> > c    99    30
> >
> > i would like to return the following
> > (given the max of the second field (ascii value of lower case), give the
> > value)
> > (a,97,10)
> > (b,98,20)
> > (c,99,30)
> > ...
> >
> > however, i get the following output
> > (a,97.0,{(1),(10)})
> > (b,98.0,{(2),(20)})
> > (c,99.0,{(3),(30)})
> >
> > my pig script is the following:
> >
> > letters         = load '$input_path' as (letter:chararray,
> > ascii:chararray, value:int);
> > letter_group    = group letters by letter;
> > letter_with_max = foreach letter_group generate group,
> MAX(letters.ascii),
> > letters.value;
> > dump letter_with_max;
> >
> >
> >
> >
> > --
> > Thank You,
> > Matthew Purdy
> >
> >
> >
> ------------------------------------------------------------------------------------------------------------------
> > Matthew Purdy
> > [EMAIL PROTECTED]
> > 443.848.1595
> > --------------------------------------
> > "Lead, follow, or get out of the way." -- Thomas Paine
> > "Make everything as simple as possible, but not simpler." -- Albert
> > Einstein
> > "The definition of insanity is doing the same thing over and over and
> > expecting a different result." -- Benjamin Franklin
> > "We can't solve problems by using the same kind of thinking we used when
> > we created them." -- Albert Einstein
> >
> ------------------------------------------------------------------------------------------------------------------
> >
> >
>

--
Thank You,
Matthew Purdy

------------------------------------------------------------------------------------------------------------------
Matthew Purdy
[EMAIL PROTECTED]
443.848.1595
--------------------------------------
"Lead, follow, or get out of the way." -- Thomas Paine
"Make everything as simple as possible, but not simpler." -- Albert Einstein
"The definition of insanity is doing the same thing over and over and
expecting a different result." -- Benjamin Franklin
"We can't solve problems by using the same kind of thinking we used when we
created them." -- Albert Einstein
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB