Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Using average function is really slow


Copy link to this message
-
Using average function is really slow
Hi,

I am using the built-in org.apache.pig.builtin.AVG function. I have a set
of 100,000 items that I want to average.

The relevant pig latin is below:
L = FOREACH K GENERATE AVG(I.productcost), AVG(I.deliverycost);
STORE L INTO 'output' USING PigStorage (',');
In the Hadoop Admin Console, I can see several jobs that finish quickly (I
can see they all use many map and reduce tasks).

However, eventually Hadoop executes a job with a single map and reduce task
which is taking forever to finish (it has been running for several hours so
far). All the map and reduce tasks report 100% complete, but I can see that
one of the statistics called "Map output records" is slowly increasing and
the job status remains as 'Running'.

Could anyone provide any advice in how I could go about diagnosing the
cause of this problem? I suspect the average function is taking a long time
to execute, but I thought calculating the average of 100,000 items would
not take that long.

Thanks,
James
+
Ruslan Al-Fakikh 2012-07-04, 21:05
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB