Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Re: How to reduce total shuffle time


+
Gaurav Dasgupta 2012-08-29, 04:33
+
Gaurav Dasgupta 2012-08-28, 07:16
Copy link to this message
-
Re: How to reduce total shuffle time
Tsuyoshi OZAWA 2012-08-28, 08:37
It depends of workload. Could you tell us more specification about
your job? In general case which reducers are bottleneck, there are
some tuning techniques as follows:
1. Allocate more memory to reducers. It decreases disk IO of reducers
when merging and running reduce functions.
2. Use combine function, which enable mapper-side aggregation
processing, if your MR job consists of the operations that satisfy
both the commutative and the associative low.

See also about combine functions:
http://wiki.apache.org/hadoop/HadoopMapReduce

Tsuyoshi

On Tuesday, August 28, 2012, Gaurav Dasgupta wrote:
>
> Hi,
>
> I have run some large and small jobs and calculated the Total Shuffle Time for the jobs. I can see that the Total Shuffle Time is almost half the Total Time which was taken by the full job to complete.
>
> My question, here, is that how can we decrease the Total Shuffle Time? And doing so, what will be its effect on the Job?
>
> Thanks,
> Gaurav Dasgupta