Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: How to reduce total shuffle time


+
Gaurav Dasgupta 2012-08-29, 04:33
+
Gaurav Dasgupta 2012-08-28, 07:16
Copy link to this message
-
Re: How to reduce total shuffle time
It depends of workload. Could you tell us more specification about
your job? In general case which reducers are bottleneck, there are
some tuning techniques as follows:
1. Allocate more memory to reducers. It decreases disk IO of reducers
when merging and running reduce functions.
2. Use combine function, which enable mapper-side aggregation
processing, if your MR job consists of the operations that satisfy
both the commutative and the associative low.

See also about combine functions:
http://wiki.apache.org/hadoop/HadoopMapReduce

Tsuyoshi

On Tuesday, August 28, 2012, Gaurav Dasgupta wrote:
>
> Hi,
>
> I have run some large and small jobs and calculated the Total Shuffle Time for the jobs. I can see that the Total Shuffle Time is almost half the Total Time which was taken by the full job to complete.
>
> My question, here, is that how can we decrease the Total Shuffle Time? And doing so, what will be its effect on the Job?
>
> Thanks,
> Gaurav Dasgupta