|
|
-
is implementing WritableComparable and setting Job.setSortComparatorClass(...) redundant?
Jane Wayne 2012-03-20, 06:47
quick question:
i have a key that already implements WritableComparable. this will be the intermediary key passed from the map to the reducer.
is it necessary to extend RawComparator and set it on Job.setSortComparatorClass(Class<? extends RawComparator> cls) ?
-
Re: is implementing WritableComparable and setting Job.setSortComparatorClass(...) redundant?
Chris White 2012-03-20, 10:30
Setting sortComparatorClass will allow you to configure a RawComparator implementation (allowing you to do more efficient comparisons at the byte level). If you don't set it then hadoop uses the WritableComparator by default. This implementation deserializes the bytes into instances using your readFields method and then calls compareTo to determine key ordering. (look at the source in org.apache.hadoop.io.WritableComparator.compare(byte[], int, int, byte[], int, int))
So if you don't want to be as efficient as possible, then delegating to WritableComparator is probably fine.
Note that you can also configure a RawComparator for your key class using a static block to register it with WritableComparator, look at the source for Text for an example of this:
/** A WritableComparator optimized for Text keys. */ public static class Comparator extends WritableComparator { public Comparator() { super(Text.class); }
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) { int n1 = WritableUtils.decodeVIntSize(b1[s1]); int n2 = WritableUtils.decodeVIntSize(b2[s2]); return compareBytes(b1, s1+n1, l1-n1, b2, s2+n2, l2-n2); } }
static { // register this comparator WritableComparator.define(Text.class, new Comparator()); }
Chris
On Tue, Mar 20, 2012 at 2:47 AM, Jane Wayne <[EMAIL PROTECTED]> wrote: > quick question: > > i have a key that already implements WritableComparable. this will be the > intermediary key passed from the map to the reducer. > > is it necessary to extend RawComparator and set it on > Job.setSortComparatorClass(Class<? extends RawComparator> cls) ?
-
Re: is implementing WritableComparable and setting Job.setSortComparatorClass(...) redundant?
Jane Wayne 2012-03-20, 15:57
thanks chris!
On Tue, Mar 20, 2012 at 6:30 AM, Chris White <[EMAIL PROTECTED]>wrote:
> Setting sortComparatorClass will allow you to configure a > RawComparator implementation (allowing you to do more efficient > comparisons at the byte level). If you don't set it then hadoop uses > the WritableComparator by default. This implementation deserializes > the bytes into instances using your readFields method and then calls > compareTo to determine key ordering. (look at the source in > org.apache.hadoop.io.WritableComparator.compare(byte[], int, int, > byte[], int, int)) > > So if you don't want to be as efficient as possible, then delegating > to WritableComparator is probably fine. > > Note that you can also configure a RawComparator for your key class > using a static block to register it with WritableComparator, look at > the source for Text for an example of this: > > /** A WritableComparator optimized for Text keys. */ > public static class Comparator extends WritableComparator { > public Comparator() { > super(Text.class); > } > > public int compare(byte[] b1, int s1, int l1, > byte[] b2, int s2, int l2) { > int n1 = WritableUtils.decodeVIntSize(b1[s1]); > int n2 = WritableUtils.decodeVIntSize(b2[s2]); > return compareBytes(b1, s1+n1, l1-n1, b2, s2+n2, l2-n2); > } > } > > static { > // register this comparator > WritableComparator.define(Text.class, new Comparator()); > } > > Chris > > On Tue, Mar 20, 2012 at 2:47 AM, Jane Wayne <[EMAIL PROTECTED]> > wrote: > > quick question: > > > > i have a key that already implements WritableComparable. this will be the > > intermediary key passed from the map to the reducer. > > > > is it necessary to extend RawComparator and set it on > > Job.setSortComparatorClass(Class<? extends RawComparator> cls) ? >
|
|