|
|
-
JobToken not Found when Integrating Hadoop/LustreParker, Matthew - IS 2012-12-19, 16:37
I'm trying to replace HDFS with Lustre, and I'm having configuration issues trying to run teragen from the TeraSort benchmark (see stacktrace below). I followed the directions on the Apache Wiki on setting up Lustre (http://wiki.lustre.org/images/1/1b/Hadoop_wp_v0.4.2.pdf), which indicated that if you set two variables everything should work: fs.default.name, and mapred.local.dir (see excerpt below)
>> To run Hadoop over Lustre file system, first of all Lustre should installed on every node in >> the cluster and mounted at the same path such as /Lustre. Modify the configuration which >> Hadoop used to build the file system. Give the path where Lustre was mounted to the >> variable ‘fs.default.name’. And ‘mapred.local.dir’ should be set to an independent >> directory. When running job, just start JobTracker and TaskTracker. In this means, >> Hadoop will use Lustre file system to store all information. I'm using hadoop-0.20.0-cdh3u4. Here are my configuration settings: ********* core-site.xml ************** <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.default.name</name> <value>file:///lustre/site-h/tmp/susan</value> </property> <property> <name>mapred.system.dir</name> <value>${fs.default.name}/hadoop_tmp/mapred/system</value> <description>The shared directory where MapReduce stores control files.</description> </property> </configuration> ************ hdfs-site.xml ************ <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.http.address</name> <value>namenode.ld.net:50070</value> </property> <property> <name>dfs.secondaryhttp.address</name> <value>secnamenode.ld.net:50090</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.name.dir</name> <value>file:///lustre/site-h/tmp/susan/hdfs/name</value> <!-- value>/lustre/site-h/tmp/susan/${hostname}/hdfs/name</value --> </property> <property> <name>dfs.data.dir</name> <value>file:///lustre/site-h/tmp/susan/data1/hdfs/data,file:///lustre/site-h/tmp/susan/data2/hdfs/data,file:///lustre/site-h/tmp/susan/data3/hdfs/data</value> </property> </configuration> ************* mapred-site.xml ****************** <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapred.jobtracker.taskScheduler</name> <value>org.apache.hadoop.mapred.JobQueueTaskScheduler</value> </property> <property> <name>mapred.job.tracker</name> <value>jobtracker.ld.net:8021</value> </property> <property> <name>mapred.job.tracker.http.address</name> <value>jobtracker.ld.net:50030</value> </property> <property> <name>mapred.local.dir</name> <value>/srv/cloud/hadoop/cache/hadoop/mapred</value> <final>true</final> </property> <property> <name>mapred.reduce.tasks</name> <value>7</value> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>8</value> <final>true</final> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>8</value> <final>true</final> </property> <property> <name>mapred.child.java.opts</name> <value>-Xmx1024M</value> </property> </configuration> ************************************************ The system runs fine when integrated with HDFS, but I get the following stack trace when running the following teragen command: su -s /bin/bash -c 'hadoop jar /usr/lib/hadoop-0.20/hadoop-0.20.2-cdh3u4-examples.jar teragen -Dmapred.map.tasks=152 100000 file:///lustre/site-h/tmp/mapred/terasort' mapred The file doesn't exist on the system, but the following directory is there: file:/tmp/hadoop-mapred/mapred/system Any help you can provide would be greatly appreciated. ============ Stacktrace Running Teragen ============================== [root@jobtracker ~]# ./teragen.sh Deleted file:/lustre/site-h/tmp/mapred/terasort 12/12/19 11:22:19 INFO util.NativeCodeLoader: Loaded the native-hadoop library Generating 100000 using 152 maps with step of 657 12/12/19 11:22:19 INFO mapred.JobClient: Running job: job_201212190955_0001 12/12/19 11:22:20 INFO mapred.JobClient: map 0% reduce 0% 12/12/19 11:22:20 INFO mapred.JobClient: Task Id : attempt_201212190955_0001_m_000153_0, Status : FAILED Error initializing attempt_201212190955_0001_m_000153_0: java.io.FileNotFoundException: File file:/tmp/hadoop-mapred/mapred/system/job_201212190955_0001/jobToken does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:408) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251) at org.apache.hadoop.mapred.TaskTracker.localizeJobTokenFile(TaskTracker.java:4529) at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1321) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1262) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2602) at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2566) 12/12/19 11:22:20 WARN mapred.JobClient: Error reading task outputhttp://r01svr6.ld.net:50060/tasklog?plaintext=true&attemptid=attempt_201212190955_0001_m_000153_0&filter=stdout 12/12/19 11:22:20 WARN mapred.JobClient: Error reading task outputhttp://r01svr6.ld.net:50060/tasklog?plaintext=true&attemptid=attempt_201212190955_0001_m_000153_0&filter=stderr 12/12/19 11:22:20 INFO mapred.JobClient: Task Id : attempt_201212190955_0001_r |