Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Re: Best practices for hadoop shuffling/tunning ?


Copy link to this message
-
Re: Best practices for hadoop shuffling/tunning ?
Arun C Murthy 2012-01-31, 21:31
Moving to mapreduce-user@, bcc common-user@. Please use project specific lists.

Your io.sort.mb is too high. You only have 1G of heap for the map. Reduce parallel copies is too high too.

On Jan 30, 2012, at 4:50 AM, praveenesh kumar wrote:

> Hey guys,
>
> Just wanted to ask, are there any sort of best practices to be followed for
> hadoop shuffling improvements ?
>
> I am running Hadoop 0.20.205 on 8 nodes cluster.Each node is 24 cores/CPUs
> with 48 GB RAM.
>
> I have set the following parameters :
>
> fs.inmemory.size.mb=2000
> io.sort.mb=2000
> io.sort.factor=200
> io.file.buffer.size=262544
>
> mapred.map.tasks=200
> mapred.reduce.tasks=40
> mapred.reduce.parallel.copies=80
> mapred.map.child.java.opts = 1024 Mb
> mapred.map.reduce.java.opts=1024 Mb
>
> mapred.job.tracker.handler.count=60
> tasktracker.http.threads=50
> mapred.job.reuse.jvm.num.tasks = -1
> mapred.compress.map.output = true
> mapred.reduce.slowstart.completed.maps = 0.5
>
> mapred.tasktracker.map.tasks.maximum=24
> mapred.tasktracker.reduce.tasks.maximum=12
>
>
> Can anyone please validate the above tuning parameters, and suggest any
> further improvements ?
> My mappers are running fine. Shuffling and reducing part is comparatively
> slower, than expected for normal jobs. Wanted to know what I am doing
> wrong/missing.
>
> Thanks,
> Praveenesh

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/