Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> Available of Intermediate data generated by mappers


Copy link to this message
-
Re: Available of Intermediate data generated by mappers
yes, I finally find the corresponding codes

it's in TaskTracker.MapOutputServelet,
doGet()->sendMapFile()->TaskTracker.MapOutputLost()

it's true that the hadoop use redo strategy to solve this problem , but for
some papers, it indicates that we can also replicate the intermediate result
to make it fault-tolerance

Thank you very much

Nan

On Wed, Oct 13, 2010 at 4:07 PM, newpant <[EMAIL PROTECTED]> wrote:

> Hi, according to Hadoop The Definitive Guide , map will store the
> intermediate output to a in-memory buffer first, and the spill it to local
> disk which configured by mapred.local.dir, so from i knew, if the
> intermediate data lost , only redo can fix it.
>
> if i wrong, please correct me.
>
> 2010/9/27 Nan Zhu <[EMAIL PROTECTED]>
>
> > Hi, all
> >
> > I'm  not sure which mail list I should send my question to, sorry for any
> > inconvenience I brought
> >
> > I'm interested in that how hadoop handles the lost of intermediate data
> > generated by map tasks currently, as some papers suggest,  for the
> > situation
> > that  the data needed by reducers are lost, we should compare the cost
> > leading by redo the task and replicating the data, if redoing the task
> > costs
> > more, we can offer more replication of the intermediate data generated by
> > map to ensure that reducers can access the data, otherwise, we just redo
> > the
> > corresponding map task when we detect the lost
> >
> > I'm not sure what's the strategy adopted by hadoop currently, I haven't
> > find
> > the code on this function, can anyone give me some suggestions?
> >
> > Thank you
> >
> > Nan
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB