Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS, mail # user - JobToken not Found when Integrating Hadoop/Lustre


Copy link to this message
-
JobToken not Found when Integrating Hadoop/Lustre
Parker, Matthew - IS 2012-12-19, 16:37
I'm trying to replace HDFS with Lustre, and I'm having configuration issues trying to run teragen from the TeraSort benchmark (see stacktrace below). I followed the directions on the Apache Wiki on setting up Lustre (http://wiki.lustre.org/images/1/1b/Hadoop_wp_v0.4.2.pdf), which indicated that if you set two variables everything should work: fs.default.name, and mapred.local.dir (see excerpt below)

>> To run Hadoop over Lustre file system, first of all Lustre should installed on every node in
>> the cluster and mounted at the same path such as /Lustre. Modify the configuration which
>> Hadoop used to build the file system. Give the path where Lustre was mounted to the
>> variable ‘fs.default.name’. And ‘mapred.local.dir’ should be set to an independent
>> directory. When running job, just start JobTracker and TaskTracker. In this means,
>> Hadoop will use Lustre file system to store all information.

I'm using hadoop-0.20.0-cdh3u4. Here are my configuration settings:

********* core-site.xml **************

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
   <property>
        <name>fs.default.name</name>
        <value>file:///lustre/site-h/tmp/susan</value>
   </property>
   <property>
        <name>mapred.system.dir</name>
        <value>${fs.default.name}/hadoop_tmp/mapred/system</value>
        <description>The shared directory where MapReduce stores control files.</description>
   </property>
</configuration>

************ hdfs-site.xml ************

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
   <property>
        <name>dfs.http.address</name>
        <value>namenode.ld.net:50070</value>
   </property>
   <property>
        <name>dfs.secondaryhttp.address</name>
        <value>secnamenode.ld.net:50090</value>
   </property>
   <property>
        <name>dfs.replication</name>
        <value>1</value>
   </property>
   <property>
        <name>dfs.permissions</name>
        <value>false</value>
   </property>
   <property>
        <name>dfs.name.dir</name>
        <value>file:///lustre/site-h/tmp/susan/hdfs/name</value>
        <!-- value>/lustre/site-h/tmp/susan/${hostname}/hdfs/name</value -->
   </property>
   <property>
        <name>dfs.data.dir</name>
        <value>file:///lustre/site-h/tmp/susan/data1/hdfs/data,file:///lustre/site-h/tmp/susan/data2/hdfs/data,file:///lustre/site-h/tmp/susan/data3/hdfs/data</value>
   </property>
</configuration>

************* mapred-site.xml ******************

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>mapred.jobtracker.taskScheduler</name>
        <value>org.apache.hadoop.mapred.JobQueueTaskScheduler</value>
    </property>
    <property>
        <name>mapred.job.tracker</name>
        <value>jobtracker.ld.net:8021</value>
    </property>
    <property>
        <name>mapred.job.tracker.http.address</name>
        <value>jobtracker.ld.net:50030</value>
    </property>
    <property>
        <name>mapred.local.dir</name>
        <value>/srv/cloud/hadoop/cache/hadoop/mapred</value>
        <final>true</final>
    </property>
    <property>
        <name>mapred.reduce.tasks</name>
        <value>7</value>
    </property>
    <property>
        <name>mapred.tasktracker.map.tasks.maximum</name>
        <value>8</value>
        <final>true</final>
    </property>
    <property>
        <name>mapred.tasktracker.reduce.tasks.maximum</name>
        <value>8</value>
        <final>true</final>
    </property>
    <property>
        <name>mapred.child.java.opts</name>
        <value>-Xmx1024M</value>
    </property>
</configuration>

************************************************

The system runs fine when integrated with HDFS, but I get the following stack trace when running the following teragen command:

su -s /bin/bash -c 'hadoop jar /usr/lib/hadoop-0.20/hadoop-0.20.2-cdh3u4-examples.jar teragen -Dmapred.map.tasks=152 100000 file:///lustre/site-h/tmp/mapred/terasort' mapred

The file doesn't exist on the system, but the following directory is there:

file:/tmp/hadoop-mapred/mapred/system

Any help you can provide would be greatly appreciated.

============ Stacktrace Running Teragen ==============================
[root@jobtracker ~]# ./teragen.sh
Deleted file:/lustre/site-h/tmp/mapred/terasort
12/12/19 11:22:19 INFO util.NativeCodeLoader: Loaded the native-hadoop library
Generating 100000 using 152 maps with step of 657
12/12/19 11:22:19 INFO mapred.JobClient: Running job: job_201212190955_0001
12/12/19 11:22:20 INFO mapred.JobClient:  map 0% reduce 0%
12/12/19 11:22:20 INFO mapred.JobClient: Task Id : attempt_201212190955_0001_m_000153_0, Status : FAILED
Error initializing attempt_201212190955_0001_m_000153_0:
java.io.FileNotFoundException: File file:/tmp/hadoop-mapred/mapred/system/job_201212190955_0001/jobToken does not exist.
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:408)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
        at org.apache.hadoop.mapred.TaskTracker.localizeJobTokenFile(TaskTracker.java:4529)
        at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1321)
        at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1262)
        at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2602)
        at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2566)

12/12/19 11:22:20 WARN mapred.JobClient: Error reading task outputhttp://r01svr6.ld.net:50060/tasklog?plaintext=true&attemptid=attempt_201212190955_0001_m_000153_0&filter=stdout
12/12/19 11:22:20 WARN mapred.JobClient: Error reading task outputhttp://r01svr6.ld.net:50060/tasklog?plaintext=true&attemptid=attempt_201212190955_0001_m_000153_0&filter=stderr
12/12/19 11:22:20 INFO mapred.JobClient: Task Id : attempt_201212190955_0001_r