For a MapReduce job with lots of intermediate results between mapper and
reducer, I implement a combiner function with a more compact representation
of the result data and I verified the final result is good when using
combiner. But when I look at the job counter "FILE_BYTES_WRITTEN" or
"Reduce shuffle bytes", the number with combiner is twice bigger than
without combiner. In my comprehension, these two counters represent the
output size of mapper. And with a combiner, the size of mapper output
should decrease, but it's not the case here.
So it means that my combiner doesn't work and it actually increase the size
of mapper output?
Software Engineer Intern @ KXEN Inc.
UTC - Université de Technologie de Compiègne
* **GI06 - Fouille de Données et Décisionnel*