Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Rank within a group


+
M G 2013-04-15, 20:25
+
Johnny Zhang 2013-04-15, 20:58
+
M G 2013-04-15, 21:10
Hi,

nested RANK is not supported yet, however it is easy to implement as a UDF.
Just sort the records and assign an increasing counter with the UDF.
We will probably add support for nested RANK in the next release.
Cheers,

--
Gianmarco
On Mon, Apr 15, 2013 at 11:10 PM, M G <[EMAIL PROTECTED]> wrote:

> Hi Johnny Zhang:
>
>
> What I am looking for is overall rank and rank within each group. Sorry if
> I was not clear.
>
> What I am looking to get is something like this.
>
> (1, 1, John, Banking, 20000)
> (5, 2, Jane, Banking, 35000)
> (3, 1, Asha, Technology, 26000)
> (2, 1, Hari, Real Estate, 22000)
> (4, 2, Chen, Real Estate, 30000)
>
> Thanks,
> Mythili
>
>
> On Mon, Apr 15, 2013 at 1:58 PM, Johnny Zhang <[EMAIL PROTECTED]>
> wrote:
>
> > Hi, M G:
> > for input data
> > John    Banking 20000
> > Jane    Banking 35000
> > Chen    Real Estate     30000
> > Hari    Real Estate     22000
> > Asha    Technology      26000
> >
> >
> > a = load '/var/lib/jenkins/income' as (name:chararray,
> industry:chararray,
> > income:int);
> > b = rank a by income;
> > c = group b by industry;
> > d = foreach c generate flatten(b);
> > dump d;
> >
> > output is:
> > (1,John,Banking,20000)
> > (5,Jane,Banking,35000)
> > (3,Asha,Technology,26000)
> > (2,Hari,Real Estate,22000)
> > (4,Chen,Real Estate,30000)
> >
> > Johnny
> >
> >
> > On Mon, Apr 15, 2013 at 1:25 PM, M G <[EMAIL PROTECTED]> wrote:
> >
> > > Is there a way to do RANK within a group in PIG 0.11.1?
> > >
> > > In the following sample dataset, I would like to Rank DESC by Income,
> and
> > > further RANK by Income for  each Industry.
> > >
> > > Name  Industry Income
> > >
> > > John,Banking, 20,000
> > > Jane, Banking, 35,000
> > > Chen,Real Estate, 30,000
> > > Hari, Real Estate, 22,000
> > > Asha, Technology, 26,000
> > >
> > > I tried something like this, but I get syntax error.
> > >
> > > names_by_ind = group names by industry;
> > >
> > > rank_by_ind = foreach names_by_ind {
> > > results = RANK names BY income DESC;
> > > GENERATE flatten(results);
> > > }
> > >
> >
>
+
M G 2013-04-19, 02:16
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB