I've run into this link:
Looks like a nice idea. Have someone tried something similar?
On Wed, Sep 18, 2013 at 4:46 PM, Shahab Yunus <[EMAIL PROTECTED]>wrote:
> Yes, you are correct that copying phase starts while the maps are running
> and the reduce function is not called until everything is done but aren't
> the Reduce tasks are also already 'initialized' at this point? Which, as
> far as I know and might be wrong, will not have the map input records
> counter (and was my point)?
> On Tue, Sep 17, 2013 at 11:09 PM, Rahul Bhattacharjee <
> [EMAIL PROTECTED]> wrote:
>> One question - You mentioned - "In the normal configuration, the issue
>> here is that Reducers can start before all the Maps have finished so it is
>> not possible to get the number (or make sense of it even if you are able
>> I think , reducers would start copying the data form the completed map
>> tasks , but will not start the actual reduce process until data from all
>> the mappers are pulled in.
>> So , the call to the counter Yorn has made might work.If invoked from the
>> reduce method.
>> On Wed, Sep 18, 2013 at 7:38 AM, java8964 java8964 <[EMAIL PROTECTED]>wrote:
>>> Or you do the calculation in the reducer close() method, even though I
>>> am not sure in the reducer you can get the Mapper's count.
>>> But even you can't, here is what can do:
>>> 1) Save the JobConf reference in your Mapper conf metehod
>>> 2) Store the Map_INPUT_RECORDS counter in the configuration object as
>>> your own properties, in the close() method of the mapper
>>> 3) Retrieve that property in the reducer close() method, then you have
>>> both numbers at that time.
>>> Date: Tue, 17 Sep 2013 09:49:06 -0400
>>> Subject: Re: MAP_INPUT_RECORDS counter in the reducer
>>> From: [EMAIL PROTECTED]
>>> To: [EMAIL PROTECTED]
>>> In the normal configuration, the issue here is that Reducers can start
>>> before all the Maps have finished so it is not possible to get the number
>>> (or make sense of it even if you are able to,)
>>> Having said that, you can specifically make sure that Reducers don't
>>> start until all your maps have completed. It will of course slow down your
>>> job. I don't know whether with this option it will work or not, but you can
>>> try (until experts have some advise already.)
>>> On Tue, Sep 17, 2013 at 6:09 AM, Yaron Gonen <[EMAIL PROTECTED]>wrote:
>>> Is there a way for the reducer to get the total number of input records
>>> to the map phase?
>>> For example, I want the reducer to normalize a sum by dividing it in the
>>> number of records. I tried getting the value of that counter by using the
>>> in the reducer code, but I got 0.