Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Get field from bag with constraints from same relation


Copy link to this message
-
Re: Get field from bag with constraints from same relation
Hi Thomas,

Try this:

data1 = LOAD '1.txt' USING PigStorage('|') AS (n:int,
B:bag{(m:int,s:chararray)});
data2 = FOREACH data1 GENERATE n, FLATTEN(B);
data3 = FILTER data2 BY B::m <= n;
data4 = GROUP data3 BY n;
data5 = FOREACH data4 {
    data6 = ORDER data3 BY B::m DESC;
    data7 = LIMIT data6 1;
    GENERATE data7;
}
data8 = FOREACH data5 GENERATE FLATTEN(data7);
data9 = FOREACH data8 GENERATE n, B::s;
DUMP data9;

The input is:
4|{(1,abc),(2,cde),(5,efg)}
2|{(1,foo),(2,bar),(5,baz)}
7|{(1,bounce),(2,frotz),(5,trotz)}

The output is:
(2,bar)
(4,cde)
(7,trotz)

Thanks,
Cheolsoo
On Tue, Jan 22, 2013 at 8:24 AM, Thomas Bach
<[EMAIL PROTECTED]>wrote:

> On Tue, Jan 22, 2013 at 12:55:22PM +0100, Thomas Bach wrote:
> > Hi there,
> >
> > I have the following data
> >
> > 4     {(1,abc),(2,cde),(5,efg)}
> > 2     {(1,foo),(2,bar),(5,baz)}
> > 7     {(1,bounce),(2,frotz),(5,trotz)}
> >
> > what I finally want to achieve is a list of all strings related to the
> > largest number in the tuple that is less-equal the first number in
> > the row. i.e.:
> >
> > (4,cde)
> > (2,bar)
> > (5,trotz)
> >
>
> This should be
>
> (4,cde)
> (2,bar)
> (7,trotz)
>
> of course.
>
> Regards,
>         Thomas Bach.
>