Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # dev - Available of Intermediate data generated by mappers


Copy link to this message
-
Available of Intermediate data generated by mappers
Nan Zhu 2010-09-27, 05:35
Hi, all

I'm  not sure which mail list I should send my question to, sorry for any
inconvenience I brought

I'm interested in that how hadoop handles the lost of intermediate data
generated by map tasks currently, as some papers suggest,  for the situation
that  the data needed by reducers are lost, we should compare the cost
leading by redo the task and replicating the data, if redoing the task costs
more, we can offer more replication of the intermediate data generated by
map to ensure that reducers can access the data, otherwise, we just redo the
corresponding map task when we detect the lost

I'm not sure what's the strategy adopted by hadoop currently, I haven't find
the code on this function, can anyone give me some suggestions?

Thank you

Nan