Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> end of input event to a mapper


Copy link to this message
-
Re: end of input event to a mapper
Sometime a user needs to output some more records after the last  
record is seen. So, with the solution Jothi provided, a user still has  
to cache the output collector (and reporter) object in his/her Mapper  
class. This also requires that a mapper will be called with the same  
output collector object for every map() call and its life exists until  
after close() is called. Neither of them are explicitly described in  
the documentation of the Mapper.map() API. Personally, I think it  
might be better to add a boolean flag on the map() call, or call map()  
one last time with null/null as key/value.

On Jul 6, 2009, at 11:13 AM, Ted Dunning wrote:

> That does not quite mean that this is the last map() call in a global
> sense.  In fact, the entire map task could be run a second time by the
> framework.
>
> Close does mean that this particular mapper object will not receive  
> any more
> map() calls and thus can clean up any resources it owned.
>
> The original poster probably meant last Map() in the smaller sense,  
> but it
> is really important to think in terms of non-deterministic execution  
> of
> maps, so it is probably a good think to inject this slight  
> qualification to
> the answer.
>
> On Mon, Jul 6, 2009 at 4:48 AM, Jothi Padmanabhan <[EMAIL PROTECTED]
> >wrote:
>
>> For each map task, there is a configure() method and a close() that  
>> get
>> called before and after the actual map method itself. You could do  
>> your
>> processing there. You could get the current task ID in the  
>> configure method
>> and use that to decide and perform specific map related activities  
>> if you
>> so
>> wish.
>>
>> Jothi
>>
>>
>> On 7/6/09 5:09 PM, "Uri Shani" <[EMAIL PROTECTED]> wrote:
>>
>>> Hi.
>>> How can the mapper task know this is the last map() call?
>>> Do I need to intervene within the Hadoop framework? where?
>>>
>>> Thanks,
>>> - Uri
>>
>>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>
> 111 West Evelyn Ave. Ste. 202
> Sunnyvale, CA 94086
> http://www.deepdyve.com
> 858-414-0013 (m)
> 408-773-0220 (fax)