Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Re: How is sharing done in HDFS ?


+
Kun Ling 2013-05-22, 08:33
Copy link to this message
-
Re: How is sharing done in HDFS ?
Kun Ling 2013-05-22, 09:41
Hi Agarwal,
    Thanks to Harsh J's reply. I have found the following code( based on
hadoop-1.0.4)  that may give you some help:

   localizedJobTokenFile() in TaskTracker.java: which localize a file named
 "JobToken" .
   localizeJobConfFile() in TaskTracker.java: which localize a file named
"Job.xml"
   And also some Distributed Cache files will also be localized by calling
the function: taskDistributedCacheManager.setupCache().

   all the above function is called in the initializeJob() method of
TaskTracker.java.

And the JobToken file is copied from the directory from
jobClient.getSystemDir(), which is initialized as an shared directory in
HDFS  in offerService() of TaskTracker.java.
  To Harsh:   While after looking into the sourcecode( based on
hadoop-1.0.4), I have the following questions:
    1. Where is the Job.xml stored in the shared HDFS, while looking into
 the code, I only found the readFields(DataInput in) method of class Task
in Task.java. And the only statement is " jobFile = Text.readString(in)"

   2. There is also a _partition.lst file, and also job.jar file, which is
also shared by all the Tasks, While I do not find any code corresponding to
localize this file, Do you know what code in which file makes partition.lst
localization happen?

   3. Is there any file that need to share, besides  JobToken, Job.xml,
distributed cache files, _partition.lst, job.jar file?

   4. all the observation is based on Hadoop 1.0.4 source code. Any update
of the latest hadoop-2.0-alpha, and the Hadoop-trunk?
On Wed, May 22, 2013 at 4:45 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> The job-specific files, placed by the client, are downloaded individually
> by every tasktracker from the HDFS (The process is called "localization" of
> the task before it starts up) and then used.
>
>
> On Wed, May 22, 2013 at 1:59 PM, Agarwal, Nikhil <
> [EMAIL PROTECTED]> wrote:
>
>>  Hi,****
>>
>> ** **
>>
>> Can anyone guide me to some pointers or explain how HDFS shares the
>> information put in the temporary directories (hadoop.tmp.dir,
>> mapred.tmp.dir, etc.) to all other nodes? ****
>>
>> ** **
>>
>> I suppose that during execution of a MapReduce job, the JobTracker
>> prepares a file called jobtoken and puts it in the temporary directories.
>> which needs to be read by all TaskTrackers. So, how does HDFS share the
>> contents? Does it use nfs mount or ….?****
>>
>> ** **
>>
>> Thanks & Regards,****
>>
>> Nikhil****
>>
>> ** **
>>
>
>
>
> --
> Harsh J
>

--
http://www.lingcc.com