Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> MAP_INPUT_RECORDS counter in the reducer


Copy link to this message
-
Re: MAP_INPUT_RECORDS counter in the reducer
Hi again,
I've run into this link:
http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201112.mbox/%3CCAFE9998.2FEF6%[EMAIL PROTECTED]%3E
Looks like a nice idea. Have someone tried something similar?

Thanks
On Wed, Sep 18, 2013 at 4:46 PM, Shahab Yunus <[EMAIL PROTECTED]>wrote:

> Yes, you are correct that copying phase starts while the maps are running
> and the reduce function is not called until everything is done but aren't
> the Reduce tasks are also already 'initialized' at this point? Which, as
> far as I know and might be wrong, will not have the map input records
> counter (and was my point)?
>
> Regards,
> Shahab
>
>
> On Tue, Sep 17, 2013 at 11:09 PM, Rahul Bhattacharjee <
> [EMAIL PROTECTED]> wrote:
>
>> Shahab,
>>
>> One question - You mentioned - "In the normal configuration, the issue
>> here is that Reducers can start before all the Maps have finished so it is
>> not possible to get the number (or make sense of it even if you are able
>> to,)"
>>
>> I think , reducers would start copying the data form the completed map
>> tasks , but will not start the actual reduce process until data from all
>> the mappers are pulled in.
>>
>> So , the call to the counter Yorn has made might work.If invoked from the
>> reduce method.
>>
>> Thanks,
>> Rahul
>>
>>
>>
>> On Wed, Sep 18, 2013 at 7:38 AM, java8964 java8964 <[EMAIL PROTECTED]>wrote:
>>
>>> Or you do the calculation in the reducer close() method, even though I
>>> am not sure in the reducer you can get the Mapper's count.
>>>
>>> But even you can't, here is what can do:
>>> 1) Save the JobConf reference in your Mapper conf metehod
>>> 2) Store the Map_INPUT_RECORDS counter in the configuration object as
>>> your own properties, in the close() method of the mapper
>>> 3) Retrieve that property in the reducer close() method, then you have
>>> both numbers at that time.
>>>
>>> Yong
>>>
>>> ------------------------------
>>> Date: Tue, 17 Sep 2013 09:49:06 -0400
>>> Subject: Re: MAP_INPUT_RECORDS counter in the reducer
>>> From: [EMAIL PROTECTED]
>>> To: [EMAIL PROTECTED]
>>>
>>>
>>> In the normal configuration, the issue here is that Reducers can start
>>> before all the Maps have finished so it is not possible to get the number
>>> (or make sense of it even if you are able to,)
>>>
>>> Having said that, you can specifically make sure that Reducers don't
>>> start until all your maps have completed. It will of course slow down your
>>> job. I don't know whether with this option it will work or not, but you can
>>> try (until experts have some advise already.)
>>>
>>> Regards,
>>> Shahab
>>>
>>>
>>> On Tue, Sep 17, 2013 at 6:09 AM, Yaron Gonen <[EMAIL PROTECTED]>wrote:
>>>
>>> Hi,
>>> Is there a way for the reducer to get the total number of input records
>>> to the map phase?
>>> For example, I want the reducer to normalize a sum by dividing it in the
>>> number of records. I tried getting the value of that counter by using the
>>> line:
>>>
>>> context.getCounter(Task.Counter.MAP_INPUT_RECORDS).getValue();
>>>
>>> in the reducer code, but I got 0.
>>>
>>> Thanks!
>>> Yaron
>>>
>>>
>>>
>>
>