|
|
-
Hadoop 0.20.2: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201008131730_0001/attempt_201008131730_0001_m_000000_2/output/file.out.index in any of the configured local directories
Kevin Chen 2010-08-13, 10:16
Hello,
I've confused the problem for a week already, Please sharing if you know what could be causing this, Thinks in advance!
Hadoop version: 0.20.2
Machines:
machine 1 - NameNode,JobTracker,SecondNameNode: OS: ubuntu Hostname: master ulimit -n: 10240 disk space: enough ${mapred.local.dir}'s permissions: 777
machine 2 - DataNode, TaskTracker:
OS: ubuntu
Hostname: slave1
ulimit -n: 10240 disk space: enough ${mapred.local.dir}'s permissions: 777
machine 3 - DataNode, TaskTracker:
OS: ubuntu
Hostname: slave2
ulimit -n: 10240 disk space: enough ${mapred.local.dir}'s permissions: 777 Configurations:
core-site.xml: <configuration> <property> <name>hadoop.tmp.dir</name> <value>tmp</value> </property> <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property> </configuration> hdfs-site.xml: <configuration> </configuration> mapred-site.xml: <configuration> <property> <name>mapred.job.tracker</name> <value>master:9001</value> </property> </configuration> masters: bestgembler@master slaves: hadoop@slave1 hadoop@slave2
Parts of error: Terminal: 10/08/13 17:35:03 INFO mapred.FileInputFormat: Total input paths to process : 1 10/08/13 17:35:03 INFO mapred.JobClient: Running job: job_201008131730_0001 10/08/13 17:35:04 INFO mapred.JobClient: map 0% reduce 0% 10/08/13 17:35:14 INFO mapred.JobClient: map 50% reduce 0% 10/08/13 17:35:17 INFO mapred.JobClient: map 100% reduce 0% 10/08/13 17:35:23 INFO mapred.JobClient: Task Id : attempt_201008131730_0001_m_000001_0, Status : FAILED Map output lost, rescheduling: getMapOutput(attempt_201008131730_0001_m_000001_0,0) failed : org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201008131730_0001/attempt_201008131730_0001_m_000001_0/output/file.out.index in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:389) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2887) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:324) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
hadoop-bestgembler-jobtracker-master.log: 2010-08-13 17:32:54,127 INFO org.apache.hadoop.mapred.JobTracker: Starting RUNNING 2010-08-13 17:32:54,127 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 9001: starting 2010-08-13 17:32:54,127 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2010-08-13 17:32:54,128 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 9001: starting 2010-08-13 17:32:54,152 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9001: starting 2010-08-13 17:32:54,152 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 9001: starting 2010-08-13 17:32:54,152 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 9001: starting 2010-08-13 17:32:54,152 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 9001: starting 2010-08-13 17:32:54,152 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 9001: starting 2010-08-13 17:32:54,153 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 9001: starting 2010-08-13 17:32:54,153 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 9001: starting 2010-08-13 17:32:54,153 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 9001: starting 2010-08-13 17:32:54,153 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 9001: starting 2010-08-13 17:32:54,471 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/slave1 2010-08-13 17:32:57,603 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/slave2 2010-08-13 17:35:03,905 INFO org.apache.hadoop.mapred.JobTracker: Initializing job_201008131730_0001 2010-08-13 17:35:03,905 INFO org.apache.hadoop.mapred.JobInProgress: Initializing job_201008131730_0001 2010-08-13 17:35:04,370 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_201008131730_0001 = 1366. Number of splits = 2 2010-08-13 17:35:04,370 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201008131730_0001_m_000000 has split on node:/default-rack/slave2 2010-08-13 17:35:04,371 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201008131730_0001_m_000000 has split on node:/default-rack/slave1 2010-08-13 17:35:04,371 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201008131730_0001_m_000001 has split on node:/default-rack/slave2 2010-08-13 17:35:04,371 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201008131730_0001_m_000001 has split on node:/d
-
Re: Hadoop 0.20.2: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201008131730_0001/attempt_201008131730_0001_m_000000_2/output/file.out.index in any of the configured local directories
Hemanth Yamijala 2010-08-13, 10:23
Aside from usual checks regarding connectivity between master and slaves, one quick observation is that the hadoop.tmp.dir says 'tmp'. Shouldn't this be '/tmp' ?
Thanks Hemanth 2010/8/13 Kevin Chen <[EMAIL PROTECTED]>: > > Hello, > > I've confused the problem for a week already, Please sharing if you know what could be causing this, Thinks in advance! > > Hadoop version: 0.20.2 > > Machines: > > machine 1 - NameNode,JobTracker,SecondNameNode: > OS: ubuntu > Hostname: master > ulimit -n: 10240 > disk space: enough > ${mapred.local.dir}'s permissions: 777 > > machine 2 - DataNode, TaskTracker: > > OS: ubuntu > > Hostname: slave1 > > ulimit -n: 10240 > disk space: enough > ${mapred.local.dir}'s permissions: 777 > > machine 3 - DataNode, TaskTracker: > > OS: ubuntu > > Hostname: slave2 > > ulimit -n: 10240 > disk space: enough > ${mapred.local.dir}'s permissions: 777 > > > Configurations: > > core-site.xml: > <configuration> > <property> > <name>hadoop.tmp.dir</name> > <value>tmp</value> > </property> > <property> > <name>fs.default.name</name> > <value>hdfs://master:9000</value> > </property> > </configuration> > > > hdfs-site.xml: > <configuration> > </configuration> > > > mapred-site.xml: > <configuration> > <property> > <name>mapred.job.tracker</name> > <value>master:9001</value> > </property> > </configuration> > > > masters: > bestgembler@master > > > slaves: > hadoop@slave1 > hadoop@slave2 > > Parts of error: > > > Terminal: > 10/08/13 17:35:03 INFO mapred.FileInputFormat: Total input paths to process : 1 > 10/08/13 17:35:03 INFO mapred.JobClient: Running job: job_201008131730_0001 > 10/08/13 17:35:04 INFO mapred.JobClient: map 0% reduce 0% > 10/08/13 17:35:14 INFO mapred.JobClient: map 50% reduce 0% > 10/08/13 17:35:17 INFO mapred.JobClient: map 100% reduce 0% > 10/08/13 17:35:23 INFO mapred.JobClient: Task Id : attempt_201008131730_0001_m_000001_0, Status : FAILED > Map output lost, rescheduling: getMapOutput(attempt_201008131730_0001_m_000001_0,0) failed : > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201008131730_0001/attempt_201008131730_0001_m_000001_0/output/file.out.index in any of the configured local directories > at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:389) > at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138) > at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2887) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) > at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) > at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) > at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:324) > at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) > at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) > at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) > at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) > > hadoop-bestgembler-jobtracker-master.log:
-
RE: Hadoop 0.20.2: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201008131730_0001/attempt_201008131730_0001_m_000000_2/output/file.out.index in any of the configured local directories
Kevin Chen 2010-08-13, 10:52
Thinks for your reply!
1. I login through SSH without password from master and slaves, it's all right :-)
2. <property> <name>hadoop.tmp.dir</name> <value>tmp</value> </property>
In fact, 'tmp' is what I want :-)
$HADOOP_HOME + tmp + dfs + mapred > From: [EMAIL PROTECTED] > > Aside from usual checks regarding connectivity between master and > slaves, one quick observation is that the hadoop.tmp.dir says 'tmp'. > Shouldn't this be '/tmp' ? > > Thanks > Hemanth
-
Re: Hadoop 0.20.2: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201008131730_0001/attempt_201008131730_0001_m_000000_2/output/file.out.index in any of the configured local directories
Hemanth Yamijala 2010-08-13, 11:05
Hi,
> 1. I login through SSH without password from master and slaves, it's all right :-) > > 2. > <property> > <name>hadoop.tmp.dir</name> > <value>tmp</value> > </property> > > In fact, 'tmp' is what I want :-) > > $HADOOP_HOME > + tmp > + dfs > + mapred >
I am not sure if a relative part works. When I tried with a similar setting, I wasn't able to run jobs, though admittedly the problem was not the same as what you got. Could you try with a full path just in case...
Thanks hemanth
> >> From: [EMAIL PROTECTED] >> >> Aside from usual checks regarding connectivity between master and >> slaves, one quick observation is that the hadoop.tmp.dir says 'tmp'. >> Shouldn't this be '/tmp' ? >> >> Thanks >> Hemanth >
-
RE: Hadoop 0.20.2: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201008131730_0001/attempt_201008131730_0001_m_000000_2/output/file.out.index in any of the configured local directories
Kevin . 2010-08-16, 03:07
Hi, Hemanth. Thinks for your reply!
I tried your recommendation, absolute path, it worked, I was able to run the jobs successfully. Thank you! I was wondering why hadoop.tmp.dir ( or mapred.local.dir ? ) with relative path didn't work.
Thanks.
> Date: Fri, 13 Aug 2010 16:35:24 +0530 > Subject: Re: Hadoop 0.20.2: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201008131730_0001/attempt_201008131730_0001_m_000000_2/output/file.out.index in any of the configured local directories > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > > Hi, > > I am not sure if a relative part works. When I tried with a similar > setting, I wasn't able to run jobs, though admittedly the problem was > not the same as what you got. Could you try with a full path just in > case... > > Thanks > hemanth
-
Re: Hadoop 0.20.2: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201008131730_0001/attempt_201008131730_0001_m_000000_2/output/file.out.index in any of the configured local directories
Allen Wittenauer 2010-08-16, 06:44
On Aug 15, 2010, at 8:07 PM, Kevin . wrote: > I tried your recommendation, absolute path, it worked, I was able to run the jobs successfully. Thank you! > I was wondering why hadoop.tmp.dir ( or mapred.local.dir ? ) with relative path didn't work. Allen's Hadoop Operations Rule #1: Nothing works the way you expect it would.
;)
-
RE: Hadoop 0.20.2: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201008131730_0001/attempt_201008131730_0001_m_000000_2/output/file.out.index in any of the configured local directories
Kevin . 2010-08-16, 10:31
> From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > Date: Mon, 16 Aug 2010 06:44:37 +0000 > > > On Aug 15, 2010, at 8:07 PM, Kevin . wrote: > > I tried your recommendation, absolute path, it worked, I was able to run the jobs successfully. Thank you! > > I was wondering why hadoop.tmp.dir ( or mapred.local.dir ? ) with relative path didn't work. > > > Allen's Hadoop Operations Rule #1: Nothing works the way you expect it would. > > ;)
You've hit the nail on the head, Allen.
:-)
-
Re: Hadoop 0.20.2: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201008131730_0001/attempt_201008131730_0001_m_000000_2/output/file.out.index in any of the configured local directories
Patrick Angeles 2010-08-16, 14:56
I'd also recommend setting mapred.local.dir and dfs.data.dir to something that is not under /tmp.
Aside from your HDFS data getting wiped, these settings should ideally be comma separated paths, one for each physical disk in your server so you can aggregate disk I/O.
2010/8/15 Kevin . <[EMAIL PROTECTED]>
> > Hi, Hemanth. Thinks for your reply! > > I tried your recommendation, absolute path, it worked, I was able to run > the jobs successfully. Thank you! > I was wondering why hadoop.tmp.dir ( or mapred.local.dir ? ) with relative > path didn't work. > > Thanks. > > > Date: Fri, 13 Aug 2010 16:35:24 +0530 > > Subject: Re: Hadoop 0.20.2: > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > taskTracker/jobcache/job_201008131730_0001/attempt_201008131730_0001_m_000000_2/output/file.out.index > in any of the configured local directories > > From: [EMAIL PROTECTED] > > To: [EMAIL PROTECTED] > > > > Hi, > > > > I am not sure if a relative part works. When I tried with a similar > > setting, I wasn't able to run jobs, though admittedly the problem was > > not the same as what you got. Could you try with a full path just in > > case... > > > > Thanks > > hemanth > >
-
Re: Hadoop 0.20.2: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201008131730_0001/attempt_201008131730_0001_m_000000_2/output/file.out.index in any of the configured local directories
Hemanth Yamijala 2010-08-17, 08:08
Hi,
> Hi, Hemanth. Thinks for your reply! > > I tried your recommendation, absolute path, it worked, I was able to run the jobs successfully. Thank you! > I was wondering why hadoop.tmp.dir ( or mapred.local.dir ? ) with relative path didn't work.
I am not entirely sure, but when the daemon is launched on a slave node, I don't really know what the current directory is set to. Hence, it is unpredictable.
> > Thanks. > >> Date: Fri, 13 Aug 2010 16:35:24 +0530 >> Subject: Re: Hadoop 0.20.2: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201008131730_0001/attempt_201008131730_0001_m_000000_2/output/file.out.index in any of the configured local directories >> From: [EMAIL PROTECTED] >> To: [EMAIL PROTECTED] >> >> Hi, >> >> I am not sure if a relative part works. When I tried with a similar >> setting, I wasn't able to run jobs, though admittedly the problem was >> not the same as what you got. Could you try with a full path just in >> case... >> >> Thanks >> hemanth >
|
|