Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: Find reducer for a key


Copy link to this message
-
Re: Find reducer for a key
Hi Hemanth,

thanks for your reply.
Yes, this partially answered to my question. I know how hash
partitioner works and I guessed something similar.
The piece that I missed was that mapred.task.partition returns the
partition number of the reducer.
So, putting al the pieces together I undersand that: for each key in
the file I have to call the HashPartitioner.
Then I have to compare the returned index with the one retrieved by
Configuration.getInt("mapred.task.partition").
If it is equal then such a key will be served by that reducer. Is this correct?
To answer to your question:
In a reduce side of a MR job, I want to load from file some data in a
in-memory structure. Actually, I don't need to store the whole file
for each reducer, but only the lines that are related to such keys a
particular reducers will receive.
So, my intention is to know the keys in the setup method to store only
the needed lines.

Thanks,
Alberto
On 28 March 2013 11:01, Hemanth Yamijala <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Not sure if I am answering your question, but this is the background. Every
> MapReduce job has a partitioner associated to it. The default partitioner is
> a HashPartitioner. You can as a user write your own partitioner as well and
> plug it into the job. The partitioner is responsible for splitting the map
> outputs key space among the reducers.
>
> So, to know which reducer a key will go to, it is basically the value
> returned by the partitioner's getPartition method. For e.g this is the code
> in the HashPartitioner:
>
>   public int getPartition(K2 key, V2 value,
>                           int numReduceTasks) {
>     return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
>   }
>
> mapred.task.partition is the key that defines the partition number of this
> reducer.
>
> I guess you can piece together these bits into what you'd want.. However, I
> am interested in understanding why you want to know this ? Can you share
> some info ?
>
> Thanks
> Hemanth
>
>
> On Thu, Mar 28, 2013 at 2:17 PM, Alberto Cordioli
> <[EMAIL PROTECTED]> wrote:
>>
>> Hi everyone,
>>
>> how can i know the keys that are associated to a particular reducer in
>> the setup method?
>> Let's assume in the setup method to read from a file where each line
>> is a string that will become a key emitted from mappers.
>> For each of these lines I would like to know if the string will be a
>> key associated with the current reducer or not.
>>
>> I read something about mapred.task.partition and mapred.task.id, but I
>> didn't understand the usage.
>>
>>
>> Thanks,
>> Alberto
>>
>>
>> --
>> Alberto Cordioli
>
>

--
Alberto Cordioli
+
Alberto Cordioli 2013-03-30, 13:28
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB