Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Using average function is really slow

Copy link to this message
Using average function is really slow

I am using the built-in org.apache.pig.builtin.AVG function. I have a set
of 100,000 items that I want to average.

The relevant pig latin is below:
L = FOREACH K GENERATE AVG(I.productcost), AVG(I.deliverycost);
STORE L INTO 'output' USING PigStorage (',');
In the Hadoop Admin Console, I can see several jobs that finish quickly (I
can see they all use many map and reduce tasks).

However, eventually Hadoop executes a job with a single map and reduce task
which is taking forever to finish (it has been running for several hours so
far). All the map and reduce tasks report 100% complete, but I can see that
one of the statistics called "Map output records" is slowly increasing and
the job status remains as 'Running'.

Could anyone provide any advice in how I could go about diagnosing the
cause of this problem? I suspect the average function is taking a long time
to execute, but I thought calculating the average of 100,000 items would
not take that long.

Ruslan Al-Fakikh 2012-07-04, 21:05