|
|
Stan Rosenberg 2011-08-15, 01:20
Hi Folks,
After much poking around I am still unable to determine why I am seeing 'reduce' being called twice with the "same" key. Recall from my previous email that "sameness" is determined by 'compareTo' of my custom key type.
AFAIK, the default WritableComparator invokes 'compareTo' for any two keys which are being ordered during sorting and merging. Is it somehow possible that a bitwise comparator is used for the spilled map output rather than the default WritableComparator?
I am out of clues, short of studying the "shuffling" code. If anyone can suggest some further debugging steps, don't be shy. :)
Thanks!!!
stan
Joey Echeverria 2011-08-15, 01:33
Does your compareTo() method test object pointer equality? If so, you could be getting burned by Hadoop reusing Writable objects.
-Joey On Aug 14, 2011 9:20 PM, "Stan Rosenberg" <[EMAIL PROTECTED]> wrote: > Hi Folks, > > After much poking around I am still unable to determine why I am seeing > 'reduce' being called twice with the "same" key. > Recall from my previous email that "sameness" is determined by 'compareTo' > of my custom key type. > > AFAIK, the default WritableComparator invokes 'compareTo' for any two keys > which are being ordered during sorting and merging. > Is it somehow possible that a bitwise comparator is used for the spilled map > output rather than the default WritableComparator? > > I am out of clues, short of studying the "shuffling" code. If anyone can > suggest some further debugging steps, don't be shy. :) > > Thanks!!! > > stan
Stan Rosenberg 2011-08-15, 02:07
On Sun, Aug 14, 2011 at 9:33 PM, Joey Echeverria <[EMAIL PROTECTED]> wrote:
> Does your compareTo() method test object pointer equality? If so, you could > be getting burned by Hadoop reusing Writable objects. Yes, but only the equality between enum values. Interestingly, when 'reduce' is called there are three instances of the "same" key. Two instances are correctly merged and they both come from the same mapper. The other instance comes from a different mapper, and for some reason does not get merged. I see the key and the values (corresponding to the two merged instances) passed as arguments to 'reduce'; then in subsequent 'reduce' call I see the key and the value corresponding to the third instance.
For completeness, here is my 'Key.compareTo':
public int compareTo(Key o) { if (this.type != o.type) { // Type.X < Type.Y return (this.type == Type.X ? -1 : 1); } // otherwise, delegate if (this.type == Type.X) { return this.key1.compareTo(o.key1); } else { return this.key2.compareTo(o.key2); } }
The 'type' field is an enum with two possible values, say X and Y. Key is essentially a union type; i.e., at any given time it's the values in key1 or key2 that are being compared (depending on the 'type' value).
Joey Echeverria 2011-08-15, 02:25
What are the types of key1 and key2? What does the readFields() method look like?
-Joey
On Sun, Aug 14, 2011 at 10:07 PM, Stan Rosenberg <[EMAIL PROTECTED]> wrote: > On Sun, Aug 14, 2011 at 9:33 PM, Joey Echeverria <[EMAIL PROTECTED]> wrote: > >> Does your compareTo() method test object pointer equality? If so, you could >> be getting burned by Hadoop reusing Writable objects. > > > Yes, but only the equality between enum values. Interestingly, when > 'reduce' is called there are three instances of the "same" key. > Two instances are correctly merged and they both come from the same mapper. > The other instance comes from a different mapper, and for > some reason does not get merged. I see the key and the values > (corresponding to the two merged instances) passed as arguments > to 'reduce'; then in subsequent 'reduce' call I see the key and the value > corresponding to the third instance. > > For completeness, here is my 'Key.compareTo': > > public int compareTo(Key o) { > if (this.type != o.type) { > // Type.X < Type.Y > return (this.type == Type.X ? -1 : 1); > } > // otherwise, delegate > if (this.type == Type.X) { > return this.key1.compareTo(o.key1); > } else { > return this.key2.compareTo(o.key2); > } > } > > The 'type' field is an enum with two possible values, say X and Y. Key is > essentially a union type; i.e., at any given time > it's the values in key1 or key2 that are being compared (depending on the > 'type' value). >
-- Joseph Echeverria Cloudera, Inc. 443.305.9434
Stan Rosenberg 2011-08-15, 02:39
On Sun, Aug 14, 2011 at 10:25 PM, Joey Echeverria <[EMAIL PROTECTED]> wrote:
> What are the types of key1 and key2? What does the readFields() method > look like? The type of key1 is essentially a wrapper for java.util.UUID. Here is its readFields:
public void readFields(DataInput in) throws IOException { id = new UUID(in.readLong(), in.readLong()); }
So, it reconstitutes the UUID by deserializing two longs. The 'compareTo' method of this key type delegates to java.util.UUID.compareTo.
The type of key2 wraps a different id, one that fits into a long. In addition to an id, it also stores an enum which designates the "source" of this id. Here is its readFields:
public void readFields(DataInput in) throws IOException { source = Source.values()[in.readByte() & 0xFF]; id = in.readLong(); }
The source is an enum value which is serialized by writing its ordinal. (There are only two possible enum values, hence only one byte.) The 'compareTo' method of this key type orders by the source values if the id values are different, otherwise by the id values.
Chris White 2011-08-16, 10:14
Are you using a hash partioner? If so make sure the hash value of the writable is not calculated using the hashCode value of the enum - use the ordinal value instead. The hashcode value of an enum is different for each jvm.
Stan Rosenberg 2011-08-16, 14:20
On Tue, Aug 16, 2011 at 6:14 AM, Chris White <[EMAIL PROTECTED]>wrote:
> Are you using a hash partioner? If so make sure the hash value of the > writable is not calculated using the hashCode value of the enum - use the > ordinal value instead. The hashcode value of an enum is different for each > jvm. >
Thanks for the tip. I am using a hash partitioner (its the default) but my hash value is not based on an enum value. In any case, the keys in question get hashed to the same reducer.
Best,
stan
Chris White 2011-08-16, 20:47
Can you copy the contents of your parent Writable readField and write methods (not the ones youve already posted)
Another thing you could try is if you know you have two identical keys, can you write a unit test to examine the result of compareTo for two instances to confirm the correct behavior (even going as far as serializing and deserializing before the comparison)
Finally just to confirm, you dont have any group or order comparators registered?
Stan Rosenberg 2011-08-22, 15:53
Thanks for all the help on this issue. It turned out to be a very simple problem with my 'compareTo' implementation. The ordering was symmetric but _not_ transitive.
stan
On Tue, Aug 16, 2011 at 4:47 PM, Chris White <[EMAIL PROTECTED]>wrote:
> Can you copy the contents of your parent Writable readField and write > methods (not the ones youve already posted) > > Another thing you could try is if you know you have two identical keys, can > you write a unit test to examine the result of compareTo for two instances > to confirm the correct behavior (even going as far as serializing and > deserializing before the comparison) > > Finally just to confirm, you dont have any group or order comparators > registered? >
|
|