Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> distributed cache

Copy link to this message
Re: distributed cache
Thanks Kai, using higher replication count for the purpose of?


On Sat, Dec 22, 2012 at 8:44 PM, Kai Voigt <[EMAIL PROTECTED]> wrote:

> Hi,
> Am 22.12.2012 um 13:03 schrieb Lin Ma <[EMAIL PROTECTED]>:
> > I want to confirm when on each task node either mapper or reducer access
> distributed cache file, it resides on disk, not resides in memory. Just
> want to make sure distributed cache file does not fully loaded into memory
> which compete memory consumption with mapper/reducer tasks. Is that correct?
> Yes, you are correct. The JobTracker will put files for the distributed
> cache into HDFS with a higher replication count (10 by default). Whenever a
> TaskTracker needs those files for a task it is launching locally, it will
> fetch a copy to its local disk. So it won't need to do this again for
> future tasks on this node. After a job is done, all local copies and the
> HDFS copies of files in the distributed cache are cleaned up.
> Kai
> --
> Kai Voigt