Ajay Srivastava 2013-01-23, 10:34
I was tuning mapred job to reduce number of spills and reached a stage where following numbers are same -
Spilled Records in map = Spilled records in reduce = Combine output Records = Reduce Input Records
I do not see any lines in mapper logs with following strings -
1. Spilling map output: record full
2. Spilling map output: buffer full
Only these strings -
1. Finished spill 0 ( Note 0 at the end )
I am confused and can someone please explain what's going on ?
1. Though neither buffer nor record got full yet there are spills ? Is it that mapper writing records at the end to be consumed by reducer that's why I see these spills ?
2. Why is combiner running if there were no spills ? If my guess is correct in point 1 then, will combiner not run if number of mappers < min.num.spills.for.combine ?
3. Why spills are counted in reducer stats ?
4. Is there way that I can tell mapper not to write final output to disk and reducers fetch the data from mapper's main memory ?