Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: WordPairCount Mapreduce question.


+
Sai Sai 2013-02-23, 12:52
Copy link to this message
-
Re: WordPairCount Mapreduce question.
Please check the in-line answers...

On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai <[EMAIL PROTECTED]> wrote:

>
> Hello
>
> I have a question about how Mapreduce sorting works internally with
> multiple columns.
>
> Below r my classes using 2 columns in an input file given below.
>
> 1st question: About the method hashCode, we r adding a "31 + ", i am
> wondering why is this required. what does 31 refer to.
>
This is how usually hashcode is calculated for any String instance
(s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]) where n stands for length of
the String. Since in your case you only have 2 chars then it will be a *
31^0 + b * 31^1.
>
> 2nd question: what if my input file has 3 columns instead of 2 how would
> you write a compare method and was wondering if anyone can map this to a
> real world scenario it will be really helpful.
>
you will extend the same approach for the third column,
 public int compareTo(WordPairCountKey o) {
        int diff = word1.compareTo(o.word1);
        if (diff == 0) {
            diff = word2.compareTo(o.word2);
            if(diff==0){
                 diff = word3.compareTo(o.word3);
            }
        }
        return diff;
    }
>
>
>     @Override
>     public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>         }
>         return diff;
>     }
>
>     @Override
>     public int hashCode() {
>         return word1.hashCode() + 31 * word2.hashCode();
>     }
>
> ******************************
>
> Here is my input file wordpair.txt
>
> ******************************
>
> a    b
> a    c
> a    b
> a    d
> b    d
> e    f
> b    d
> e    f
> b    d
>
> **********************************
>
> Here is my WordPairObject:
>
> *********************************
>
> public class WordPairCountKey implements
> WritableComparable<WordPairCountKey> {
>
>     private String word1;
>     private String word2;
>
>     @Override
>     public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>         }
>         return diff;
>     }
>
>     @Override
>     public int hashCode() {
>         return word1.hashCode() + 31 * word2.hashCode();
>     }
>
>
>     public String getWord1() {
>         return word1;
>     }
>
>     public void setWord1(String word1) {
>         this.word1 = word1;
>     }
>
>     public String getWord2() {
>         return word2;
>     }
>
>     public void setWord2(String word2) {
>         this.word2 = word2;
>     }
>
>     @Override
>     public void readFields(DataInput in) throws IOException {
>         word1 = in.readUTF();
>         word2 = in.readUTF();
>     }
>
>     @Override
>     public void write(DataOutput out) throws IOException {
>         out.writeUTF(word1);
>         out.writeUTF(word2);
>     }
>
>
>     @Override
>     public String toString() {
>         return "[word1=" + word1 + ", word2=" + word2 + "]";
>     }
>
> }
>
> ******************************
>
> Any help will be really appreciated.
> Thanks
> Sai
>
+
Mahesh Balija 2013-02-25, 08:14
+
Harsh J 2013-02-25, 09:17