Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Rank within a group


+
M G 2013-04-15, 20:25
+
Johnny Zhang 2013-04-15, 20:58
+
M G 2013-04-15, 21:10
Hi,

nested RANK is not supported yet, however it is easy to implement as a UDF.
Just sort the records and assign an increasing counter with the UDF.
We will probably add support for nested RANK in the next release.
Cheers,

--
Gianmarco
On Mon, Apr 15, 2013 at 11:10 PM, M G <[EMAIL PROTECTED]> wrote:

> Hi Johnny Zhang:
>
>
> What I am looking for is overall rank and rank within each group. Sorry if
> I was not clear.
>
> What I am looking to get is something like this.
>
> (1, 1, John, Banking, 20000)
> (5, 2, Jane, Banking, 35000)
> (3, 1, Asha, Technology, 26000)
> (2, 1, Hari, Real Estate, 22000)
> (4, 2, Chen, Real Estate, 30000)
>
> Thanks,
> Mythili
>
>
> On Mon, Apr 15, 2013 at 1:58 PM, Johnny Zhang <[EMAIL PROTECTED]>
> wrote:
>
> > Hi, M G:
> > for input data
> > John    Banking 20000
> > Jane    Banking 35000
> > Chen    Real Estate     30000
> > Hari    Real Estate     22000
> > Asha    Technology      26000
> >
> >
> > a = load '/var/lib/jenkins/income' as (name:chararray,
> industry:chararray,
> > income:int);
> > b = rank a by income;
> > c = group b by industry;
> > d = foreach c generate flatten(b);
> > dump d;
> >
> > output is:
> > (1,John,Banking,20000)
> > (5,Jane,Banking,35000)
> > (3,Asha,Technology,26000)
> > (2,Hari,Real Estate,22000)
> > (4,Chen,Real Estate,30000)
> >
> > Johnny
> >
> >
> > On Mon, Apr 15, 2013 at 1:25 PM, M G <[EMAIL PROTECTED]> wrote:
> >
> > > Is there a way to do RANK within a group in PIG 0.11.1?
> > >
> > > In the following sample dataset, I would like to Rank DESC by Income,
> and
> > > further RANK by Income for  each Industry.
> > >
> > > Name  Industry Income
> > >
> > > John,Banking, 20,000
> > > Jane, Banking, 35,000
> > > Chen,Real Estate, 30,000
> > > Hari, Real Estate, 22,000
> > > Asha, Technology, 26,000
> > >
> > > I tried something like this, but I get syntax error.
> > >
> > > names_by_ind = group names by industry;
> > >
> > > rank_by_ind = foreach names_by_ind {
> > > results = RANK names BY income DESC;
> > > GENERATE flatten(results);
> > > }
> > >
> >
>
+
M G 2013-04-19, 02:16