Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Rank within a group


+
M G 2013-04-15, 20:25
+
Johnny Zhang 2013-04-15, 20:58
+
M G 2013-04-15, 21:10
+
Gianmarco De Francisci Mo... 2013-04-16, 19:00
Copy link to this message
-
Re: Rank within a group
Thanks a lot for your response. Much appreciated.

Mythili

On Tue, Apr 16, 2013 at 12:00 PM, Gianmarco De Francisci Morales <
[EMAIL PROTECTED]> wrote:

> Hi,
>
> nested RANK is not supported yet, however it is easy to implement as a UDF.
> Just sort the records and assign an increasing counter with the UDF.
> We will probably add support for nested RANK in the next release.
>
>
> Cheers,
>
> --
> Gianmarco
>
>
> On Mon, Apr 15, 2013 at 11:10 PM, M G <[EMAIL PROTECTED]> wrote:
>
> > Hi Johnny Zhang:
> >
> >
> > What I am looking for is overall rank and rank within each group. Sorry
> if
> > I was not clear.
> >
> > What I am looking to get is something like this.
> >
> > (1, 1, John, Banking, 20000)
> > (5, 2, Jane, Banking, 35000)
> > (3, 1, Asha, Technology, 26000)
> > (2, 1, Hari, Real Estate, 22000)
> > (4, 2, Chen, Real Estate, 30000)
> >
> > Thanks,
> > Mythili
> >
> >
> > On Mon, Apr 15, 2013 at 1:58 PM, Johnny Zhang <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Hi, M G:
> > > for input data
> > > John    Banking 20000
> > > Jane    Banking 35000
> > > Chen    Real Estate     30000
> > > Hari    Real Estate     22000
> > > Asha    Technology      26000
> > >
> > >
> > > a = load '/var/lib/jenkins/income' as (name:chararray,
> > industry:chararray,
> > > income:int);
> > > b = rank a by income;
> > > c = group b by industry;
> > > d = foreach c generate flatten(b);
> > > dump d;
> > >
> > > output is:
> > > (1,John,Banking,20000)
> > > (5,Jane,Banking,35000)
> > > (3,Asha,Technology,26000)
> > > (2,Hari,Real Estate,22000)
> > > (4,Chen,Real Estate,30000)
> > >
> > > Johnny
> > >
> > >
> > > On Mon, Apr 15, 2013 at 1:25 PM, M G <[EMAIL PROTECTED]> wrote:
> > >
> > > > Is there a way to do RANK within a group in PIG 0.11.1?
> > > >
> > > > In the following sample dataset, I would like to Rank DESC by Income,
> > and
> > > > further RANK by Income for  each Industry.
> > > >
> > > > Name  Industry Income
> > > >
> > > > John,Banking, 20,000
> > > > Jane, Banking, 35,000
> > > > Chen,Real Estate, 30,000
> > > > Hari, Real Estate, 22,000
> > > > Asha, Technology, 26,000
> > > >
> > > > I tried something like this, but I get syntax error.
> > > >
> > > > names_by_ind = group names by industry;
> > > >
> > > > rank_by_ind = foreach names_by_ind {
> > > > results = RANK names BY income DESC;
> > > > GENERATE flatten(results);
> > > > }
> > > >
> > >
> >
>