Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Spilled records

Copy link to this message
Spilled records
Ajay Srivastava 2013-01-23, 10:34

I was tuning mapred job to reduce number of spills and reached a stage where following numbers are same -

Spilled Records in map = Spilled records in reduce = Combine output Records = Reduce Input Records
I do not see any lines in mapper logs with following strings -
1. Spilling map output: record full
2. Spilling map output: buffer full

Only these strings -
1. Finished spill 0 ( Note 0 at the end )

I am confused and can someone please explain what's going on ?

1. Though neither buffer nor record got full yet there are spills ? Is it that mapper writing records at the end to be consumed by reducer that's why I see these spills ?
2. Why is combiner running if there were no spills ? If my guess is correct in point 1 then, will combiner not run if number of mappers < min.num.spills.for.combine ?
3. Why spills are counted in reducer stats ?
4. Is there way that I can tell mapper not to write final output to disk and reducers fetch the data from mapper's main memory ?

Ajay Srivastava