Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Spilled records


Hi,

I was tuning mapred job to reduce number of spills and reached a stage where following numbers are same -

Spilled Records in map = Spilled records in reduce = Combine output Records = Reduce Input Records
I do not see any lines in mapper logs with following strings -
1. Spilling map output: record full
2. Spilling map output: buffer full

Only these strings -
1. Finished spill 0 ( Note 0 at the end )

I am confused and can someone please explain what's going on ?

1. Though neither buffer nor record got full yet there are spills ? Is it that mapper writing records at the end to be consumed by reducer that's why I see these spills ?
2. Why is combiner running if there were no spills ? If my guess is correct in point 1 then, will combiner not run if number of mappers < min.num.spills.for.combine ?
3. Why spills are counted in reducer stats ?
4. Is there way that I can tell mapper not to write final output to disk and reducers fetch the data from mapper's main memory ?

Regards,
Ajay Srivastava
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB