Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Join question


+
jamal sasha 2013-04-01, 21:06
+
Mehmet Tepedelenlioglu 2013-04-01, 21:24
Copy link to this message
-
Re: Join question
Hi,
  Yeah, there was a bug in my "stats" data.
I was wondering how can I calcualte average in pig..
Something like :
http://stackoverflow.com/questions/12593527/finding-mean-using-pig-or-hadoop

But in top response.. it seems that the user wanted to calculate across
average across all data..
as

count = COUNT(inpt)
and inpt is the complete input
whereas what i want.. that denominator is count for each id..

so my data is like:

id, value
1,1.0
1,3.0
1,5.0
2,1.0

So, the average I am expecting is:

 1, 3.0
2,1.0

as 1 +3 + 5 /3 = 3
whereas in the example.. count(inpt) should give me 4?

How do i achieve this.
Thanks
On Mon, Apr 1, 2013 at 2:24 PM, Mehmet Tepedelenlioglu <[EMAIL PROTECTED]>
wrote:
>
> Are your ids unique?
>
> On 4/1/13 2:06 PM, "jamal sasha" <[EMAIL PROTECTED]> wrote:
>
> >Hi,
> >  I have a simple join question.
> >base = load 'input1'   USING PigStorage( ',' ) as (id1, field1, field2);
> >stats = load 'input2' USING PigStorage(',') as (id1, mean, median);
> >joined = JOIN base BY  id1, stats BY id1;
> >final = FOREACH joined GENERATE base::id1, base::field1,base::field2,
> >stats::mean,stats::median;
> >STORE final INTO   'output'   USING PigStorage( ',' );
> >
> >But something doesnt feels right.
> >Inputs are of order MB's.. whereas outputs are like 100GB's...
> >
> >I tried it on sample file
> >where base is 35MB
> >stats is 10MB
> >and output explodes to GB's??
> >What am i missing?
>
>
+
Mehmet Tepedelenlioglu 2013-04-02, 01:20
+
F. Jerrell Schivers 2013-09-04, 23:39
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB