Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Re: WordPairCount Mapreduce question.


+
Sai Sai 2013-02-23, 12:52
+
Mahesh Balija 2013-02-23, 13:23
Copy link to this message
-
Re: WordPairCount Mapreduce question.
Mahesh Balija 2013-02-25, 08:14
byte array comparison is for performance reasons only, but NOT the way you
are thinking.
This method comes from an interface called RawComparator which provides the
prototype (public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2,
int l2);) for this method.
In the sorting phase where the keys are sorted, because of this
implementation the records are read from the stream directly and sorted
without the need to deserializing them into Objects.

Best,
Mahesh Balija,
CalsoftLabs.

On Sun, Feb 24, 2013 at 5:01 PM, Sai Sai <[EMAIL PROTECTED]> wrote:

> Thanks Mahesh for your help.
>
> Wondering if u can provide some insight with the below compare method
> using byte[] in the SecondarySort example:
>
> public static class Comparator extends WritableComparator {
>         public Comparator() {
>             super(URICountKey.class);
>         }
>
>         public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2,
> int l2) {
>             return compareBytes(b1, s1, l1, b2, s2, l2);
>         }
>     }
>
> My question is in the below compare method that i have given we are
> comparing word1/word2
> which makes sense but what about this byte[] comparison, is it right in
> assuming  it converts each objects word1/word2/word3 to byte[] and compares
> them.
> If so is it for performance reason it is done.
> Could you please verify.
> Thanks
> Sai
>   ------------------------------
> *From:* Mahesh Balija <[EMAIL PROTECTED]>
> *To:* [EMAIL PROTECTED]; Sai Sai <[EMAIL PROTECTED]>
> *Sent:* Saturday, 23 February 2013 5:23 AM
> *Subject:* Re: WordPairCount Mapreduce question.
>
> Please check the in-line answers...
>
> On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai <[EMAIL PROTECTED]> wrote:
>
>
> Hello
>
> I have a question about how Mapreduce sorting works internally with
> multiple columns.
>
> Below r my classes using 2 columns in an input file given below.
>
> 1st question: About the method hashCode, we r adding a "31 + ", i am
> wondering why is this required. what does 31 refer to.
>
> This is how usually hashcode is calculated for any String instance
> (s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]) where n stands for length of
> the String. Since in your case you only have 2 chars then it will be a *
> 31^0 + b * 31^1.
>
>
>
> 2nd question: what if my input file has 3 columns instead of 2 how would
> you write a compare method and was wondering if anyone can map this to a
> real world scenario it will be really helpful.
>
> you will extend the same approach for the third column,
>  public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>             if(diff==0){
>                  diff = word3.compareTo(o.word3);
>             }
>          }
>         return diff;
>     }
>
>
>
>
>     @Override
>     public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>         }
>         return diff;
>     }
>
>     @Override
>     public int hashCode() {
>         return word1.hashCode() + 31 * word2.hashCode();
>     }
>
> ******************************
>
> Here is my input file wordpair.txt
>
> ******************************
>
> a    b
> a    c
> a    b
> a    d
> b    d
> e    f
> b    d
> e    f
> b    d
>
> **********************************
>
> Here is my WordPairObject:
>
> *********************************
>
> public class WordPairCountKey implements
> WritableComparable<WordPairCountKey> {
>
>     private String word1;
>     private String word2;
>
>     @Override
>     public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>         }
>         return diff;
>     }
>
>     @Override
>     public int hashCode() {
>         return word1.hashCode() + 31 * word2.hashCode();
>     }
>
>
>     public String getWord1() {
+
Harsh J 2013-02-25, 09:17