Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> WritableComparable and the case of duplicate keys in the reducer


Copy link to this message
-
WritableComparable and the case of duplicate keys in the reducer
Hi All,

Here is what's happening.  I have implemented my own WritableComparable keys
and values.
Inside a reducer I am seeing 'reduce'  being invoked with the "same" key
_twice_.
I have checked that context.getKeyComparator() and
context.getSortComparator() are both WritableComparator which
indicates that 'compareTo' method of my key should be called when doing
reduce-side merge.

Indeed, inside the 'reduce' method I captured both key instances and did the
following checks:

((WritableComparator)context.getKeyComparator()).compare((Object)key1,
(Object)key2)
((WritableComparator)context.getSortComparator()).compare((Object)key2,
(Object)key2)

In both calls, the result is '0', confirming that key1 and key2 are
equivalent.

So, what is going on?

Note that key1 and key2 come from different mappers but they should have
been collapsed in the reducer since
they are both equal according to WritableComparator.  Also note that key1
and key2 are not bitwise equivalent, but
that shouldn't matter, or should it?

Many thanks in advance!

stan