|
|
Stan Rosenberg 2013-01-04, 21:02
Hi,
Any ideas why a staging directory would suddenly become unavailable after the completion of the map phase but before the start of the reduce phase? We noticed a sporadic failure yesterday wherein all the map tasks completed successfully and all the reduce tasks failed. Upon examining task tracker logs, the following exception stack trace was revealed:
2013-01-03 02:28:17,072 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing attempt_201211150255_237458_r_000108_1: java.io.FileNotFoundException: File does not exist: hdfs://59.bm-hadoop.prod.nym2:54310/user/apache/.staging/job_201211150255_237458/job.xml at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:562) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:207) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:157) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1371) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1352) at org.apache.hadoop.mapred.TaskTracker.localizeJobConfFile(TaskTracker.java:1434) at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1318) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1242) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2541) at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2505)
This problem doesn't seem relevant to only a specific distribution, but for completeness we are running CDH3u3.
Thanks!
stan
+
Stan Rosenberg 2013-01-04, 21:02
Harsh J 2013-01-05, 07:44
Hi Stan,
I'd check the NN audit logs for the file /user/apache/.staging/ job_201211150255_237458/job.xml to see when/who deleted it away, perhaps that would give more insight. On Sat, Jan 5, 2013 at 2:32 AM, Stan Rosenberg <[EMAIL PROTECTED]>wrote:
> Hi, > > Any ideas why a staging directory would suddenly become unavailable > after the completion of the map phase but before the start of the > reduce phase? We noticed a sporadic failure yesterday wherein all the > map tasks completed > successfully and all the reduce tasks failed. Upon examining task > tracker logs, the following exception stack trace was revealed: > > 2013-01-03 02:28:17,072 WARN org.apache.hadoop.mapred.TaskTracker: > Error initializing attempt_201211150255_237458_r_000108_1: > java.io.FileNotFoundException: File does not exist: > > hdfs://59.bm-hadoop.prod.nym2:54310/user/apache/.staging/job_201211150255_237458/job.xml > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:562) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:207) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:157) > at > org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1371) > at > org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1352) > at > org.apache.hadoop.mapred.TaskTracker.localizeJobConfFile(TaskTracker.java:1434) > at > org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1318) > at > org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1242) > at > org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2541) > at > org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2505) > > This problem doesn't seem relevant to only a specific distribution, > but for completeness we are running CDH3u3. > > Thanks! > > stan >
-- Harsh J
+
Harsh J 2013-01-05, 07:44
Harsh J 2013-01-07, 08:21
Thanks for following up, glad to know it is resolved!
On Mon, Jan 7, 2013 at 6:42 AM, Stan Rosenberg <[EMAIL PROTECTED]> wrote: > On Sat, Jan 5, 2013 at 2:44 AM, Harsh J <[EMAIL PROTECTED]> wrote: >> I'd check the NN audit logs for the file >> /user/apache/.staging/job_201211150255_237458/job.xml to see when/who >> deleted it away, perhaps that would give more insight. >> > > The audit logs led to a trail which revealed user error. Thanks Harsh!
-- Harsh J
+
Harsh J 2013-01-07, 08:21
|
|