Sometime a user needs to output some more records after the last
record is seen. So, with the solution Jothi provided, a user still has
to cache the output collector (and reporter) object in his/her Mapper
class. This also requires that a mapper will be called with the same
output collector object for every map() call and its life exists until
after close() is called. Neither of them are explicitly described in
the documentation of the Mapper.map() API. Personally, I think it
might be better to add a boolean flag on the map() call, or call map()
one last time with null/null as key/value.
On Jul 6, 2009, at 11:13 AM, Ted Dunning wrote:
> That does not quite mean that this is the last map() call in a global
> sense. In fact, the entire map task could be run a second time by the
> Close does mean that this particular mapper object will not receive
> any more
> map() calls and thus can clean up any resources it owned.
> The original poster probably meant last Map() in the smaller sense,
> but it
> is really important to think in terms of non-deterministic execution
> maps, so it is probably a good think to inject this slight
> qualification to
> the answer.
> On Mon, Jul 6, 2009 at 4:48 AM, Jothi Padmanabhan <[EMAIL PROTECTED]
>> For each map task, there is a configure() method and a close() that
>> called before and after the actual map method itself. You could do
>> processing there. You could get the current task ID in the
>> configure method
>> and use that to decide and perform specific map related activities
>> if you
>> On 7/6/09 5:09 PM, "Uri Shani" <[EMAIL PROTECTED]> wrote:
>>> How can the mapper task know this is the last map() call?
>>> Do I need to intervene within the Hadoop framework? where?
>>> - Uri
> Ted Dunning, CTO
> 111 West Evelyn Ave. Ste. 202
> Sunnyvale, CA 94086
> 858-414-0013 (m)
> 408-773-0220 (fax)