> Just one more question, does Hadoop handles reassign of task failure
> to different machines in some way?
Yes. If task fails then it is retried, preferably on a different machine.
> I saw that sometimes, usually at the end, when there are more
> "processing units" available than map() tasks to process, the same
> map() tasks might be processed twice, then one is killed when the
> other finish first.
This is called speculative execution. Jobtracker monitors the progress
of tasks and if progress for an individual task is slow, it launches
another task. Whichever finishes first is used and other
one is killed.