|
|
-
Re: WordPairCount Mapreduce question.Sai Sai 2013-02-24, 11:31
Thanks Mahesh for your help.
Wondering if u can provide some insight with the below compare method using byte[] in the SecondarySort example: public static class Comparator extends WritableComparator { public Comparator() { super(URICountKey.class); } public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) { return compareBytes(b1, s1, l1, b2, s2, l2); } } My question is in the below compare method that i have given we are comparing word1/word2 which makes sense but what about this byte[] comparison, is it right in assuming it converts each objects word1/word2/word3 to byte[] and compares them. If so is it for performance reason it is done. Could you please verify. Thanks Sai ________________________________ From: Mahesh Balija <[EMAIL PROTECTED]> To: [EMAIL PROTECTED]; Sai Sai <[EMAIL PROTECTED]> Sent: Saturday, 23 February 2013 5:23 AM Subject: Re: WordPairCount Mapreduce question. Please check the in-line answers... On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai <[EMAIL PROTECTED]> wrote: > >Hello > > >I have a question about how Mapreduce sorting works internally with multiple columns. > > >Below r my classes using 2 columns in an input file given below. > > > >1st question: About the method hashCode, we r adding a "31 + ", i am wondering why is this required. what does 31 refer to. > This is how usually hashcode is calculated for any String instance (s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]) where n stands for length of the String. Since in your case you only have 2 chars then it will be a * 31^0 + b * 31^1. > >2nd question: what if my input file has 3 columns instead of 2 how would you write a compare method and was wondering if anyone can map this to a real world scenario it will be really helpful. > you will extend the same approach for the third column, public int compareTo(WordPairCountKey o) { int diff = word1.compareTo(o.word1); if (diff == 0) { diff = word2.compareTo(o.word2); if(diff==0){ diff = word3.compareTo(o.word3); } } return diff; } > > > > @Override > public int compareTo(WordPairCountKey o) { > int diff = word1.compareTo(o.word1); > if (diff == 0) { > diff = word2.compareTo(o.word2); > } > return diff; > } > > @Override > public int hashCode() { > return word1.hashCode() + 31 * word2.hashCode(); > } > > >****************************** > >Here is my input file wordpair.txt > >****************************** > >a b >a c >a b >a d >b d >e f >b d >e f >b d > >********************************** > > >Here is my WordPairObject: > >********************************* > >public class WordPairCountKey implements WritableComparable<WordPairCountKey> { > > private String word1; > private String word2; > > @Override > public int compareTo(WordPairCountKey o) { > int diff = word1.compareTo(o.word1); > if (diff == 0) { > diff = word2.compareTo(o.word2); > } > return diff; > } > > @Override > public int hashCode() { > return word1.hashCode() + 31 * word2.hashCode(); > } > > > public String getWord1() { > return word1; > } > > public void setWord1(String word1) { > this.word1 = word1; > } > > public String getWord2() { > return word2; > } > > public void setWord2(String word2) { > this.word2 = word2; > } > > @Override > public void readFields(DataInput in) throws IOException { > word1 = in.readUTF(); > word2 = in.readUTF(); > } > > @Override > public void write(DataOutput out) throws IOException { > out.writeUTF(word1); > out.writeUTF(word2); > } > > > @Override > public String toString() { > return "[word1=" + word1 + ", word2=" + word2 + "]"; |