Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Using average function is really slow


Copy link to this message
-
Re: Using average function is really slow
Hi James,

AVG is Algebraic which means that it will use combiner when it can. It
seems that your job is not using combiner. Can you give the full
script? Also check the job config of the running job. If it is using
combiner then you should see something like
pig.job.feature=GROUP_BY,COMBINER
pig.alias=L (that would mean that the job is really about the
statement you gave, not the other statements)

Ruslan

On Wed, Jul 4, 2012 at 9:37 PM, James Newhaven <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I am using the built-in org.apache.pig.builtin.AVG function. I have a set
> of 100,000 items that I want to average.
>
> The relevant pig latin is below:
>
>
> L = FOREACH K GENERATE AVG(I.productcost), AVG(I.deliverycost);
> STORE L INTO 'output' USING PigStorage (',');
>
>
> In the Hadoop Admin Console, I can see several jobs that finish quickly (I
> can see they all use many map and reduce tasks).
>
> However, eventually Hadoop executes a job with a single map and reduce task
> which is taking forever to finish (it has been running for several hours so
> far). All the map and reduce tasks report 100% complete, but I can see that
> one of the statistics called "Map output records" is slowly increasing and
> the job status remains as 'Running'.
>
> Could anyone provide any advice in how I could go about diagnosing the
> cause of this problem? I suspect the average function is taking a long time
> to execute, but I thought calculating the average of 100,000 items would
> not take that long.
>
> Thanks,
> James