Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # dev - Available of Intermediate data generated by mappers


Copy link to this message
-
Re: Available of Intermediate data generated by mappers
Nan Zhu 2010-10-13, 15:04
yes, I finally find the corresponding codes

it's in TaskTracker.MapOutputServelet,
doGet()->sendMapFile()->TaskTracker.MapOutputLost()

it's true that the hadoop use redo strategy to solve this problem , but for
some papers, it indicates that we can also replicate the intermediate result
to make it fault-tolerance

Thank you very much

Nan

On Wed, Oct 13, 2010 at 4:07 PM, newpant <[EMAIL PROTECTED]> wrote:

> Hi, according to Hadoop The Definitive Guide , map will store the
> intermediate output to a in-memory buffer first, and the spill it to local
> disk which configured by mapred.local.dir, so from i knew, if the
> intermediate data lost , only redo can fix it.
>
> if i wrong, please correct me.
>
> 2010/9/27 Nan Zhu <[EMAIL PROTECTED]>
>
> > Hi, all
> >
> > I'm  not sure which mail list I should send my question to, sorry for any
> > inconvenience I brought
> >
> > I'm interested in that how hadoop handles the lost of intermediate data
> > generated by map tasks currently, as some papers suggest,  for the
> > situation
> > that  the data needed by reducers are lost, we should compare the cost
> > leading by redo the task and replicating the data, if redoing the task
> > costs
> > more, we can offer more replication of the intermediate data generated by
> > map to ensure that reducers can access the data, otherwise, we just redo
> > the
> > corresponding map task when we detect the lost
> >
> > I'm not sure what's the strategy adopted by hadoop currently, I haven't
> > find
> > the code on this function, can anyone give me some suggestions?
> >
> > Thank you
> >
> > Nan
> >
>