Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Get field from bag with constraints from same relation


Copy link to this message
-
Re: Get field from bag with constraints from same relation
Hi Thomas,

Try this:

data1 = LOAD '1.txt' USING PigStorage('|') AS (n:int,
B:bag{(m:int,s:chararray)});
data2 = FOREACH data1 GENERATE n, FLATTEN(B);
data3 = FILTER data2 BY B::m <= n;
data4 = GROUP data3 BY n;
data5 = FOREACH data4 {
    data6 = ORDER data3 BY B::m DESC;
    data7 = LIMIT data6 1;
    GENERATE data7;
}
data8 = FOREACH data5 GENERATE FLATTEN(data7);
data9 = FOREACH data8 GENERATE n, B::s;
DUMP data9;

The input is:
4|{(1,abc),(2,cde),(5,efg)}
2|{(1,foo),(2,bar),(5,baz)}
7|{(1,bounce),(2,frotz),(5,trotz)}

The output is:
(2,bar)
(4,cde)
(7,trotz)

Thanks,
Cheolsoo
On Tue, Jan 22, 2013 at 8:24 AM, Thomas Bach
<[EMAIL PROTECTED]>wrote:

> On Tue, Jan 22, 2013 at 12:55:22PM +0100, Thomas Bach wrote:
> > Hi there,
> >
> > I have the following data
> >
> > 4     {(1,abc),(2,cde),(5,efg)}
> > 2     {(1,foo),(2,bar),(5,baz)}
> > 7     {(1,bounce),(2,frotz),(5,trotz)}
> >
> > what I finally want to achieve is a list of all strings related to the
> > largest number in the tuple that is less-equal the first number in
> > the row. i.e.:
> >
> > (4,cde)
> > (2,bar)
> > (5,trotz)
> >
>
> This should be
>
> (4,cde)
> (2,bar)
> (7,trotz)
>
> of course.
>
> Regards,
>         Thomas Bach.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB