Yeah, there was a bug in my "stats" data.
I was wondering how can I calcualte average in pig..
Something like :
But in top response.. it seems that the user wanted to calculate across
average across all data..
count = COUNT(inpt)
and inpt is the complete input
whereas what i want.. that denominator is count for each id..
so my data is like:
So, the average I am expecting is:
as 1 +3 + 5 /3 = 3
whereas in the example.. count(inpt) should give me 4?
How do i achieve this.
On Mon, Apr 1, 2013 at 2:24 PM, Mehmet Tepedelenlioglu <[EMAIL PROTECTED]>
> Are your ids unique?
> On 4/1/13 2:06 PM, "jamal sasha" <[EMAIL PROTECTED]> wrote:
> > I have a simple join question.
> >base = load 'input1' USING PigStorage( ',' ) as (id1, field1, field2);
> >stats = load 'input2' USING PigStorage(',') as (id1, mean, median);
> >joined = JOIN base BY id1, stats BY id1;
> >final = FOREACH joined GENERATE base::id1, base::field1,base::field2,
> >STORE final INTO 'output' USING PigStorage( ',' );
> >But something doesnt feels right.
> >Inputs are of order MB's.. whereas outputs are like 100GB's...
> >I tried it on sample file
> >where base is 35MB
> >stats is 10MB
> >and output explodes to GB's??
> >What am i missing?