I'm experiencing a strange behavior when I use the Hadoop join-package.
After running a job the result statistics show that my combiner has an
input of 100 records and an output of 100 records. From the task I'm
running and the way it's implemented, I know that each key appears multiple
times and the values should be combinable before getting passed to the
reducer. I'm running my tests in pseudo-distributed mode with one or two
map tasks. From using the debugger, I noticed that each key-value pair is
processed by a combiner individually so there's actually no list passed
into the combiner that it could aggregate. Can anyone think of a reason
that causes this undesired behavior?