Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: WordPairCount Mapreduce question.


Copy link to this message
-
Re: WordPairCount Mapreduce question.
Thanks Mahesh for your help.

Wondering if u can provide some insight with the below compare method using byte[] in the SecondarySort example:

public static class Comparator extends WritableComparator {
        public Comparator() {
            super(URICountKey.class);
        }

        public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
            return compareBytes(b1, s1, l1, b2, s2, l2);
        }
    }
My question is in the below compare method that i have given we are comparing word1/word2
which makes sense but what about this byte[] comparison, is it right in assuming  it converts each objects word1/word2/word3 to byte[] and compares them.
If so is it for performance reason it is done.
Could you please verify.
Thanks
Sai
________________________________
 From: Mahesh Balija <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]; Sai Sai <[EMAIL PROTECTED]>
Sent: Saturday, 23 February 2013 5:23 AM
Subject: Re: WordPairCount Mapreduce question.
 

Please check the in-line answers...
On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai <[EMAIL PROTECTED]> wrote:
>
>Hello
>
>
>I have a question about how Mapreduce sorting works internally with multiple columns.
>
>
>Below r my classes using 2 columns in an input file given below.
>
>
>
>1st question: About the method hashCode, we r adding a "31 + ", i am wondering why is this required. what does 31 refer to.
>
This is how usually hashcode is calculated for any String instance (s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]) where n stands for length of the String. Since in your case you only have 2 chars then it will be a * 31^0 + b * 31^1.
 
>
>2nd question: what if my input file has 3 columns instead of 2 how would you write a compare method and was wondering if anyone can map this to a real world scenario it will be really helpful.
>
you will extend the same approach for the third column,
 public int compareTo(WordPairCountKey o) {
        int diff = word1.compareTo(o.word1);
        if (diff == 0) {
            diff = word2.compareTo(o.word2);
            if(diff==0){
                 diff = word3.compareTo(o.word3);
            }
        }
        return diff;
    }
   

>
>
>
>    @Override
>    public int compareTo(WordPairCountKey o) {
>        int diff = word1.compareTo(o.word1);
>        if (diff == 0) {
>            diff = word2.compareTo(o.word2);
>        }
>        return diff;
>    }
>   
>    @Override
>    public int hashCode() {
>        return word1.hashCode() + 31 * word2.hashCode();
>    }
>
>
>******************************
>
>Here is my input file wordpair.txt
>
>******************************
>
>a    b
>a    c
>a    b
>a    d
>b    d
>e    f
>b    d
>e    f
>b    d
>
>**********************************
>
>
>Here is my WordPairObject:
>
>*********************************
>
>public class WordPairCountKey implements WritableComparable<WordPairCountKey> {
>
>    private String word1;
>    private String word2;
>
>    @Override
>    public int compareTo(WordPairCountKey o) {
>        int diff = word1.compareTo(o.word1);
>        if (diff == 0) {
>            diff = word2.compareTo(o.word2);
>        }
>        return diff;
>    }
>   
>    @Override
>    public int hashCode() {
>        return word1.hashCode() + 31 * word2.hashCode();
>    }
>
>   
>    public String getWord1() {
>        return word1;
>    }
>
>    public void setWord1(String word1) {
>        this.word1 = word1;
>    }
>
>    public String getWord2() {
>        return word2;
>    }
>
>    public void setWord2(String word2) {
>        this.word2 = word2;
>    }
>
>    @Override
>    public void readFields(DataInput in) throws IOException {
>        word1 = in.readUTF();
>        word2 = in.readUTF();
>    }
>
>    @Override
>    public void
 write(DataOutput out) throws IOException {
>        out.writeUTF(word1);
>        out.writeUTF(word2);
>    }
>
>   
>    @Override
>    public String toString() {
>        return "[word1=" + word1 + ", word2=" + word2 + "]";
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB