Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Rank within a group


+
M G 2013-04-15, 20:25
Copy link to this message
-
Re: Rank within a group
Johnny Zhang 2013-04-15, 20:58
Hi, M G:
for input data
John    Banking 20000
Jane    Banking 35000
Chen    Real Estate     30000
Hari    Real Estate     22000
Asha    Technology      26000
a = load '/var/lib/jenkins/income' as (name:chararray, industry:chararray,
income:int);
b = rank a by income;
c = group b by industry;
d = foreach c generate flatten(b);
dump d;

output is:
(1,John,Banking,20000)
(5,Jane,Banking,35000)
(3,Asha,Technology,26000)
(2,Hari,Real Estate,22000)
(4,Chen,Real Estate,30000)

Johnny
On Mon, Apr 15, 2013 at 1:25 PM, M G <[EMAIL PROTECTED]> wrote:

> Is there a way to do RANK within a group in PIG 0.11.1?
>
> In the following sample dataset, I would like to Rank DESC by Income, and
> further RANK by Income for  each Industry.
>
> Name  Industry Income
>
> John,Banking, 20,000
> Jane, Banking, 35,000
> Chen,Real Estate, 30,000
> Hari, Real Estate, 22,000
> Asha, Technology, 26,000
>
> I tried something like this, but I get syntax error.
>
> names_by_ind = group names by industry;
>
> rank_by_ind = foreach names_by_ind {
> results = RANK names BY income DESC;
> GENERATE flatten(results);
> }
>
+
M G 2013-04-15, 21:10
+
Gianmarco De Francisci Mo... 2013-04-16, 19:00
+
M G 2013-04-19, 02:16