Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Re: Sorting Values sent to reducer NOT based on KEY (Depending on part of VALUE)


Copy link to this message
-
Re: Sorting Values sent to reducer NOT based on KEY (Depending on part of VALUE)
Vikas Jadhav 2013-04-24, 04:32
Thanks for reply.

Will try to implement. I think there is problem in my case where i have
modified write function of mapper context.write and tried to write same key
value pair multiple times.Also for this purpose i have modified partitioner
class. my partitioner class doesnt return single value it return list of
values array integer which contain to which partition i should write key
value  pairs.
On Tue, Apr 23, 2013 at 1:15 PM, Sofia Georgiakaki
<[EMAIL PROTECTED]>wrote:

> Hello,
>
> Sorting is done by the SortingComparator which performs sorting based on
> the value of key. A possible solution would be the following:
> You could write a custom Writable comparable class which extends
> WritableComparable (lets call it MyCompositeFieldWritableComparable), that
> will store your current key and the part of the value that you want your
> sorting to be based on. As I understand from your description, this
> writable class will have 2 IntWritable fields, e.g
> (FieldA, fieldB)
> (0,4)
> (1,1)
> (2,0)
> Implement the methods equals, sort, hashCode, etc in your custom writable
> to override the defaults. Sorting before the reduce phase will be performed
> based on the compareTo() implementation of your custom writable, so you can
> write it in a way that will compare only fieldB.
> Be careful in the way you will implement methods
> MyCompositeFieldWritableComparable.equals() -it will be used to group <key,
> list(values)> in the reducer-,
> MyCompositeFieldWritableComparable.compareTo() and
> MyCompositeFieldWritableComparable.hashCode().
> So your new KEY class will be MyCompositeFieldWritableComparable.
> As an alternative and cleaner implementation, write the
> MyCompositeFieldWritableComparable class and also a
> HashOnOneFieldPartitioner class (which extends Partitioner) that will do
> something like this:
>
> @Override
> public int getPartition(K key, V value,
>                           int numReduceTasks) {
>     if (key instanceof MyCompositeFieldWritableComparable)
>          return ( ((MyCompositeFieldWritableComparable)
> key).hashCodeBasedOnFieldB() & Integer.MAX_VALUE) % numReduceTasks;
>     else
>         return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
>   }
>
>
>
> You can also find related articles in the web, eg
> http://riccomini.name/posts/hadoop/2009-11-13-sort-reducer-input-value-hadoop/
> .
>
> Have a nice day,
> Sofia
>
>   ------------------------------
>  *From:* Vikas Jadhav <[EMAIL PROTECTED]>
> *To:* [EMAIL PROTECTED]
> *Sent:* Tuesday, April 23, 2013 8:44 AM
> *Subject:* Sorting Values sent to reducer NOT based on KEY (Depending on
> part of VALUE)
>
> Hi
>
> how to sort value in hadoop using standard sorting algorithm of hadoop (
> i.e sorting facility provided by hadoop)
>
> Requirement:
>
> 1) Values shoulde be sorted depending on some part of value
>
> For Exam     (KEY,VALUE)
>
>  (0,"BC,4,XY')
>  (1,"DC,1,PQ")
>  (2,"EF,0,MN")
>
> Sorted sequence @ reduce reached should be
>
> (2,"EF,0,MN")
> (1,"DC,1,PQ")
> (0,"BC,4,XY')
>
> Here sorted depending on second attribute postion in value.
>
> Thanks
>
>
>
> -- **
> *
>
>   Regards,
> *
> *   Vikas *
>
>
>
--
*
*
*

  Regards,*
*   Vikas *