Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: WordPairCount Mapreduce question.


Copy link to this message
-
Re: WordPairCount Mapreduce question.
Thanks Mahesh for your help.

Wondering if u can provide some insight with the below compare method using byte[] in the SecondarySort example:

public static class Comparator extends WritableComparator {
        public Comparator() {
            super(URICountKey.class);
        }

        public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
            return compareBytes(b1, s1, l1, b2, s2, l2);
        }
    }
My question is in the below compare method that i have given we are comparing word1/word2
which makes sense but what about this byte[] comparison, is it right in assuming  it converts each objects word1/word2/word3 to byte[] and compares them.
If so is it for performance reason it is done.
Could you please verify.
Thanks
Sai
________________________________
 From: Mahesh Balija <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]; Sai Sai <[EMAIL PROTECTED]>
Sent: Saturday, 23 February 2013 5:23 AM
Subject: Re: WordPairCount Mapreduce question.
 

Please check the in-line answers...
On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai <[EMAIL PROTECTED]> wrote:
>
>Hello
>
>
>I have a question about how Mapreduce sorting works internally with multiple columns.
>
>
>Below r my classes using 2 columns in an input file given below.
>
>
>
>1st question: About the method hashCode, we r adding a "31 + ", i am wondering why is this required. what does 31 refer to.
>
This is how usually hashcode is calculated for any String instance (s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]) where n stands for length of the String. Since in your case you only have 2 chars then it will be a * 31^0 + b * 31^1.
 
>
>2nd question: what if my input file has 3 columns instead of 2 how would you write a compare method and was wondering if anyone can map this to a real world scenario it will be really helpful.
>
you will extend the same approach for the third column,
 public int compareTo(WordPairCountKey o) {
        int diff = word1.compareTo(o.word1);
        if (diff == 0) {
            diff = word2.compareTo(o.word2);
            if(diff==0){
                 diff = word3.compareTo(o.word3);
            }
        }
        return diff;
    }
   

>
>
>
>    @Override
>    public int compareTo(WordPairCountKey o) {
>        int diff = word1.compareTo(o.word1);
>        if (diff == 0) {
>            diff = word2.compareTo(o.word2);
>        }
>        return diff;
>    }
>   
>    @Override
>    public int hashCode() {
>        return word1.hashCode() + 31 * word2.hashCode();
>    }
>
>
>******************************
>
>Here is my input file wordpair.txt
>
>******************************
>
>a    b
>a    c
>a    b
>a    d
>b    d
>e    f
>b    d
>e    f
>b    d
>
>**********************************
>
>
>Here is my WordPairObject:
>
>*********************************
>
>public class WordPairCountKey implements WritableComparable<WordPairCountKey> {
>
>    private String word1;
>    private String word2;
>
>    @Override
>    public int compareTo(WordPairCountKey o) {
>        int diff = word1.compareTo(o.word1);
>        if (diff == 0) {
>            diff = word2.compareTo(o.word2);
>        }
>        return diff;
>    }
>   
>    @Override
>    public int hashCode() {
>        return word1.hashCode() + 31 * word2.hashCode();
>    }
>
>   
>    public String getWord1() {
>        return word1;
>    }
>
>    public void setWord1(String word1) {
>        this.word1 = word1;
>    }
>
>    public String getWord2() {
>        return word2;
>    }
>
>    public void setWord2(String word2) {
>        this.word2 = word2;
>    }
>
>    @Override
>    public void readFields(DataInput in) throws IOException {
>        word1 = in.readUTF();
>        word2 = in.readUTF();
>    }
>
>    @Override
>    public void
 write(DataOutput out) throws IOException {
>        out.writeUTF(word1);
>        out.writeUTF(word2);
>    }
>
>   
>    @Override
>    public String toString() {
>        return "[word1=" + word1 + ", word2=" + word2 + "]";