|
|
-
Sum and count of fields
Shin Chan 2012-04-15, 05:44
Hi
I have data something like
f1,f2,f3,f4,f5
Rows with 5 fields
I have to produce final dump output as
f1,f2,f3, SUM( all fields at f4 position) , COUNT ( number of fields at f5 position ) , f4 , f5
I am trying to do using group , foreach , flatten but its getting confusing. Since flatten produce cross product i am getting lots of rows. My final output should have same number of rows as input but with SUM , COUNT columns added.
Thanks in advance for your help
-
Re: Sum and count of fields
Prashant Kommireddi 2012-04-15, 05:52
You mean count of distinct elements at f5?
Sent from my iPhone
On Apr 14, 2012, at 10:44 PM, Shin Chan <[EMAIL PROTECTED]> wrote:
> Hi > > I have data something like > > f1,f2,f3,f4,f5 > > Rows with 5 fields > > I have to produce final dump output as > > f1,f2,f3, SUM( all fields at f4 position) , COUNT ( number of fields at f5 position ) , f4 , f5 > > I am trying to do using group , foreach , flatten but its getting confusing. Since flatten produce cross product i am getting lots of rows. > My final output should have same number of rows as input but with SUM , COUNT columns added. > > Thanks in advance for your help
-
Re: Sum and count of fields
Shin Chan 2012-04-15, 06:22
Yes, I know for distinct count i can do something like
Lets say inputR contains my f1,f2,f3,f4,f5
inputRelation = group inputR ALL;
calculate = foreach inputRelation {
F4 = inputR.f4; F5 = inputR.f5; disF5 = distinct F5;
generate SUM(F4) , COUNT (distF5);
};
This produces correct result for SUM and COUNT required
But how to produce the result in required output format
f1,f2,f3, SUM( all fields at f4 position) , COUNT ( number of fields at f5 position ) , f4 , f5 When i try to use flatten , lots of things comes into picture.
I need to read more about flatten , group , nestedforeach i guess. ----- Original Message ----- From: Prashant Kommireddi Sent: 04/15/12 11:22 AM To: [EMAIL PROTECTED] Subject: Re: Sum and count of fields
You mean count of distinct elements at f5? Sent from my iPhone On Apr 14, 2012, at 10:44 PM, Shin Chan <[EMAIL PROTECTED]> wrote: > Hi > > I have data something like > > f1,f2,f3,f4,f5 > > Rows with 5 fields > > I have to produce final dump output as > > f1,f2,f3, SUM( all fields at f4 position) , COUNT ( number of fields at f5 position ) , f4 , f5 > > I am trying to do using group , foreach , flatten but its getting confusing. Since flatten produce cross product i am getting lots of rows. > My final output should have same number of rows as input but with SUM , COUNT columns added. > > Thanks in advance for your help Thanks and Regards ,
-
Re: Sum and count of fields
Dmitriy Ryaboy 2012-04-15, 21:48
You can use Pig scalars to do this:
totals = foreach (group inputR all) { f5s = distinct f5; generate SUM(f4) as sum_f4, COUNT_STAR(f5s) as counts_f5;
results = foreach inputR generate f1, f2, f3, (long) totals.sum_f4, (long) counts_f5, f4, f5;
Given this: x y z 5 foo a b c 7 foo d e f 8 bar
This should generate x y z 20 2 5 foo a b c 20 2 7 foo d e f 20 2 8 bar
-Dmitriy
On Sat, Apr 14, 2012 at 11:22 PM, Shin Chan <[EMAIL PROTECTED]> wrote: > Yes, > > > I know for distinct count i can do something like > > Lets say inputR contains my f1,f2,f3,f4,f5 > > inputRelation = group inputR ALL; > > calculate = foreach inputRelation { > > F4 = inputR.f4; > F5 = inputR.f5; > disF5 = distinct F5; > > generate SUM(F4) , COUNT (distF5); > > }; > > This produces correct result for SUM and COUNT required > > But how to produce the result in required output format > > f1,f2,f3, SUM( all fields at f4 position) , COUNT ( number of fields at f5 position ) , f4 , f5 > > > When i try to use flatten , lots of things comes into picture. > > I need to read more about flatten , group , nestedforeach i guess. > > > ----- Original Message ----- > From: Prashant Kommireddi > Sent: 04/15/12 11:22 AM > To: [EMAIL PROTECTED] > Subject: Re: Sum and count of fields > > You mean count of distinct elements at f5? Sent from my iPhone On Apr 14, 2012, at 10:44 PM, Shin Chan <[EMAIL PROTECTED]> wrote: > Hi > > I have data something like > > f1,f2,f3,f4,f5 > > Rows with 5 fields > > I have to produce final dump output as > > f1,f2,f3, SUM( all fields at f4 position) , COUNT ( number of fields at f5 position ) , f4 , f5 > > I am trying to do using group , foreach , flatten but its getting confusing. Since flatten produce cross product i am getting lots of rows. > My final output should have same number of rows as input but with SUM , COUNT columns added. > > Thanks in advance for your help > > > > > Thanks and Regards ,
-
Re: Sum and count of fields
Shin Chan 2012-04-16, 07:39
Dear Dmitriy,
Thank You ,
It worked :) ----- Original Message ----- From: Dmitriy Ryaboy Sent: 04/15/12 05:48 PM To: [EMAIL PROTECTED] Subject: Re: Sum and count of fields
You can use Pig scalars to do this: totals = foreach (group inputR all) { f5s = distinct f5; generate SUM(f4) as sum_f4, COUNT_STAR(f5s) as counts_f5; results = foreach inputR generate f1, f2, f3, (long) totals.sum_f4, (long) counts_f5, f4, f5; Given this: x y z 5 foo a b c 7 foo d e f 8 bar This should generate x y z 20 2 5 foo a b c 20 2 7 foo d e f 20 2 8 bar -Dmitriy On Sat, Apr 14, 2012 at 11:22 PM, Shin Chan <[EMAIL PROTECTED]> wrote: > Yes, > > > I know for distinct count i can do something like > > Lets say inputR contains my f1,f2,f3,f4,f5 > > inputRelation = group inputR ALL; > > calculate = foreach inputRelation { > > F4 = inputR.f4; > F5 = inputR.f5; > disF5 = distinct F5; > > generate SUM(F4) , COUNT (distF5); > > }; > > This produces correct result for SUM and COUNT required > > But how to produce the result in required output format > > f1,f2,f3, SUM( all fields at f4 position) , COUNT ( number of fields at f5 position ) , f4 , f5 > > > When i try to use flatten , lots o f things comes into picture. > > I need to read more about flatten , group , nestedforeach i guess. > > > ----- Original Message ----- > From: Prashant Kommireddi > Sent: 04/15/12 11:22 AM > To: [EMAIL PROTECTED] > Subject: Re: Sum and count of fields > > You mean count of distinct elements at f5? Sent from my iPhone On Apr 14, 2012, at 10:44 PM, Shin Chan <[EMAIL PROTECTED]> wrote: > Hi > > I have data something like > > f1,f2,f3,f4,f5 > > Rows with 5 fields > > I have to produce final dump output as > > f1,f2,f3, SUM( all fields at f4 position) , COUNT ( number of fields at f5 position ) , f4 , f5 > > I am trying to do using group , foreach , flatten but its getting confusing. Since flatten produce cross product i am getting lots of rows. > My final output should have same number of rows as input but with SUM , COUNT columns added. > > Thanks in advance for your help > > > > > Thanks and Regards , Thanks and Regards ,
|
|