Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: WordPairCount Mapreduce question.


+
Sai Sai 2013-02-23, 12:52
Copy link to this message
-
Re: WordPairCount Mapreduce question.
Please check the in-line answers...

On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai <[EMAIL PROTECTED]> wrote:

>
> Hello
>
> I have a question about how Mapreduce sorting works internally with
> multiple columns.
>
> Below r my classes using 2 columns in an input file given below.
>
> 1st question: About the method hashCode, we r adding a "31 + ", i am
> wondering why is this required. what does 31 refer to.
>
This is how usually hashcode is calculated for any String instance
(s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]) where n stands for length of
the String. Since in your case you only have 2 chars then it will be a *
31^0 + b * 31^1.
>
> 2nd question: what if my input file has 3 columns instead of 2 how would
> you write a compare method and was wondering if anyone can map this to a
> real world scenario it will be really helpful.
>
you will extend the same approach for the third column,
 public int compareTo(WordPairCountKey o) {
        int diff = word1.compareTo(o.word1);
        if (diff == 0) {
            diff = word2.compareTo(o.word2);
            if(diff==0){
                 diff = word3.compareTo(o.word3);
            }
        }
        return diff;
    }
>
>
>     @Override
>     public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>         }
>         return diff;
>     }
>
>     @Override
>     public int hashCode() {
>         return word1.hashCode() + 31 * word2.hashCode();
>     }
>
> ******************************
>
> Here is my input file wordpair.txt
>
> ******************************
>
> a    b
> a    c
> a    b
> a    d
> b    d
> e    f
> b    d
> e    f
> b    d
>
> **********************************
>
> Here is my WordPairObject:
>
> *********************************
>
> public class WordPairCountKey implements
> WritableComparable<WordPairCountKey> {
>
>     private String word1;
>     private String word2;
>
>     @Override
>     public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>         }
>         return diff;
>     }
>
>     @Override
>     public int hashCode() {
>         return word1.hashCode() + 31 * word2.hashCode();
>     }
>
>
>     public String getWord1() {
>         return word1;
>     }
>
>     public void setWord1(String word1) {
>         this.word1 = word1;
>     }
>
>     public String getWord2() {
>         return word2;
>     }
>
>     public void setWord2(String word2) {
>         this.word2 = word2;
>     }
>
>     @Override
>     public void readFields(DataInput in) throws IOException {
>         word1 = in.readUTF();
>         word2 = in.readUTF();
>     }
>
>     @Override
>     public void write(DataOutput out) throws IOException {
>         out.writeUTF(word1);
>         out.writeUTF(word2);
>     }
>
>
>     @Override
>     public String toString() {
>         return "[word1=" + word1 + ", word2=" + word2 + "]";
>     }
>
> }
>
> ******************************
>
> Any help will be really appreciated.
> Thanks
> Sai
>
+
Mahesh Balija 2013-02-25, 08:14
+
Harsh J 2013-02-25, 09:17
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB