Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Cluster config: Mapper:Reducer Task Capapcity


Copy link to this message
-
Cluster config: Mapper:Reducer Task Capapcity
Hi,

Our Hadoop cluster is running 0.20.203. The cluster currently has 'Map Task
Capacity' of 8900+ 'Reduce Task Capacity' of 3300+ resulting in a ratio of
2.7. We have a lot of variety of jobs running and we want to increase the
throughput.

My manual observation was that we hit the Mapper capacity and hence many
jobs have to wait even though lot of room left in Reduce capacity. I mined
the jobtracker logs for the jobs that completed and saw that on a hourly
basis as well as daily basis the mapper:reducer ratio was 4-5.

To increase the throughput I was thinking that I experiment changing the
Map and Reducer Task Capacity such that the ratio is increased from 2.7 to
~4.

Does this sound like a correct approach ? Is this something that I can
control or it's determined automatically by Hadoop ?

Have any of you done this kind of exercise ? If yes can you please direct
how to go about changing this ratio. I am not finding much literature on
it.

Note: Mapper and ReducerTask Capacity is the max total no. of
mappers/reducers you can run on the cluster at any point.

Regards,
-Himanshu Vijay
+
Sandy Ryza 2013-09-30, 19:52
+
Himanshu Vijay 2013-10-01, 07:06
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB