|
|
-
Available of Intermediate data generated by mappers
Nan Zhu 2010-09-27, 05:35
Hi, all
I'm not sure which mail list I should send my question to, sorry for any inconvenience I brought
I'm interested in that how hadoop handles the lost of intermediate data generated by map tasks currently, as some papers suggest, for the situation that the data needed by reducers are lost, we should compare the cost leading by redo the task and replicating the data, if redoing the task costs more, we can offer more replication of the intermediate data generated by map to ensure that reducers can access the data, otherwise, we just redo the corresponding map task when we detect the lost
I'm not sure what's the strategy adopted by hadoop currently, I haven't find the code on this function, can anyone give me some suggestions?
Thank you
Nan
-
Re: Available of Intermediate data generated by mappers
newpant 2010-10-13, 08:07
Hi, according to Hadoop The Definitive Guide , map will store the intermediate output to a in-memory buffer first, and the spill it to local disk which configured by mapred.local.dir, so from i knew, if the intermediate data lost , only redo can fix it.
if i wrong, please correct me.
2010/9/27 Nan Zhu <[EMAIL PROTECTED]>
> Hi, all > > I'm not sure which mail list I should send my question to, sorry for any > inconvenience I brought > > I'm interested in that how hadoop handles the lost of intermediate data > generated by map tasks currently, as some papers suggest, for the > situation > that the data needed by reducers are lost, we should compare the cost > leading by redo the task and replicating the data, if redoing the task > costs > more, we can offer more replication of the intermediate data generated by > map to ensure that reducers can access the data, otherwise, we just redo > the > corresponding map task when we detect the lost > > I'm not sure what's the strategy adopted by hadoop currently, I haven't > find > the code on this function, can anyone give me some suggestions? > > Thank you > > Nan >
-
Re: Available of Intermediate data generated by mappers
Nan Zhu 2010-10-13, 15:04
yes, I finally find the corresponding codes
it's in TaskTracker.MapOutputServelet, doGet()->sendMapFile()->TaskTracker.MapOutputLost()
it's true that the hadoop use redo strategy to solve this problem , but for some papers, it indicates that we can also replicate the intermediate result to make it fault-tolerance
Thank you very much
Nan
On Wed, Oct 13, 2010 at 4:07 PM, newpant <[EMAIL PROTECTED]> wrote:
> Hi, according to Hadoop The Definitive Guide , map will store the > intermediate output to a in-memory buffer first, and the spill it to local > disk which configured by mapred.local.dir, so from i knew, if the > intermediate data lost , only redo can fix it. > > if i wrong, please correct me. > > 2010/9/27 Nan Zhu <[EMAIL PROTECTED]> > > > Hi, all > > > > I'm not sure which mail list I should send my question to, sorry for any > > inconvenience I brought > > > > I'm interested in that how hadoop handles the lost of intermediate data > > generated by map tasks currently, as some papers suggest, for the > > situation > > that the data needed by reducers are lost, we should compare the cost > > leading by redo the task and replicating the data, if redoing the task > > costs > > more, we can offer more replication of the intermediate data generated by > > map to ensure that reducers can access the data, otherwise, we just redo > > the > > corresponding map task when we detect the lost > > > > I'm not sure what's the strategy adopted by hadoop currently, I haven't > > find > > the code on this function, can anyone give me some suggestions? > > > > Thank you > > > > Nan > > >
|
|