|
|
Keith Thompson 2012-05-02, 21:49
I am running a task which gets to 66% of the Reduce step and then hangs indefinitely. Here is the log file (I apologize if I am putting too much here but I am not exactly sure what is relevant):
2012-05-02 16:42:52,975 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_201202240659_6433_r_000000_0' to tip task_201202240659_6433_r_000000, for tracker 'tracker_analytix7:localhost.localdomain/127.0.0.1:56515' 2012-05-02 16:42:53,584 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201202240659_6433_m_000001_0' has completed task_201202240659_6433_m_000001 successfully. 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201202240659_6432_r_000000_0: Task attempt_201202240659_6432_r_000000_0 failed to report status for 1800 seconds. Killing! 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201202240659_6432_r_000000_0' 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker: Adding task (TASK_CLEANUP) 'attempt_201202240659_6432_r_000000_0' to tip task_201202240659_6432_r_000000, for tracker 'tracker_analytix4:localhost.localdomain/127.0.0.1:44204' 2012-05-02 17:00:48,763 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201202240659_6432_r_000000_0' 2012-05-02 17:00:48,957 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_1' to tip task_201202240659_6432_r_000000, for tracker 'tracker_analytix5:localhost.localdomain/127.0.0.1:59117' 2012-05-02 17:00:56,559 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201202240659_6432_r_000000_1: java.io.IOException: The temporary job-output directory hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary doesn't exist! at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250) at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:262)
2012-05-02 17:00:59,903 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201202240659_6432_r_000000_1' 2012-05-02 17:00:59,906 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_2' to tip task_201202240659_6432_r_000000, for tracker 'tracker_analytix3:localhost.localdomain/127.0.0.1:39980' 2012-05-02 17:01:07,200 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201202240659_6432_r_000000_2: java.io.IOException: The temporary job-output directory hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary doesn't exist! at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250) at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:262)
2012-05-02 17:01:10,239 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201202240659_6432_r_000000_2' 2012-05-02 17:01:10,283 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_3' to tip task_201202240659_6432_r_000000, for tracker 'tracker_analytix2:localhost.localdomain/127.0.0.1:33297' 2012-05-02 17:01:18,188 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201202240659_6432_r_000000_3: java.io.IOException: The temporary job-output directory hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary doesn't exist! at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250) at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:262)
2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.TaskInProgress: TaskInProgress task_201202240659_6432_r_000000 has failed 4 times. 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobInProgress: Aborting job job_201202240659_6432 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobInProgress: Killing job 'job_201202240659_6432' 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobTracker: Adding task (JOB_CLEANUP) 'attempt_201202240659_6432_m_000002_0' to tip task_201202240659_6432_m_000002, for tracker 'tracker_analytix2:localhost.localdomain/127.0.0.1:33297' 2012-05-02 17:01:21,228 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201202240659_6432_r_000000_3' 2012-05-02 17:01:22,443 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_2012022
-
Re: Reduce Hangs at 66%
Michel Segel 2012-05-03, 09:02
Well... Lots of information but still missing some of the basics...
Which release and version? What are your ulimits set to? How much free disk space do you have? What are you attempting to do?
Stuff like that.
Sent from a remote device. Please excuse any typos...
Mike Segel
On May 2, 2012, at 4:49 PM, Keith Thompson <[EMAIL PROTECTED]> wrote:
> I am running a task which gets to 66% of the Reduce step and then hangs > indefinitely. Here is the log file (I apologize if I am putting too much > here but I am not exactly sure what is relevant): > > 2012-05-02 16:42:52,975 INFO org.apache.hadoop.mapred.JobTracker: > Adding task (REDUCE) 'attempt_201202240659_6433_r_000000_0' to tip > task_201202240659_6433_r_000000, for tracker > 'tracker_analytix7:localhost.localdomain/127.0.0.1:56515' > 2012-05-02 16:42:53,584 INFO org.apache.hadoop.mapred.JobInProgress: > Task 'attempt_201202240659_6433_m_000001_0' has completed > task_201202240659_6433_m_000001 successfully. > 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.TaskInProgress: > Error from attempt_201202240659_6432_r_000000_0: Task > attempt_201202240659_6432_r_000000_0 failed to report status for 1800 > seconds. Killing! > 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker: > Removing task 'attempt_201202240659_6432_r_000000_0' > 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker: > Adding task (TASK_CLEANUP) 'attempt_201202240659_6432_r_000000_0' to > tip task_201202240659_6432_r_000000, for tracker > 'tracker_analytix4:localhost.localdomain/127.0.0.1:44204' > 2012-05-02 17:00:48,763 INFO org.apache.hadoop.mapred.JobTracker: > Removing task 'attempt_201202240659_6432_r_000000_0' > 2012-05-02 17:00:48,957 INFO org.apache.hadoop.mapred.JobTracker: > Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_1' to tip > task_201202240659_6432_r_000000, for tracker > 'tracker_analytix5:localhost.localdomain/127.0.0.1:59117' > 2012-05-02 17:00:56,559 INFO org.apache.hadoop.mapred.TaskInProgress: > Error from attempt_201202240659_6432_r_000000_1: java.io.IOException: > The temporary job-output directory > hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary > doesn't exist! > at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250) > at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240) > at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116) > at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) > at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) > at org.apache.hadoop.mapred.Child.main(Child.java:262) > > 2012-05-02 17:00:59,903 INFO org.apache.hadoop.mapred.JobTracker: > Removing task 'attempt_201202240659_6432_r_000000_1' > 2012-05-02 17:00:59,906 INFO org.apache.hadoop.mapred.JobTracker: > Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_2' to tip > task_201202240659_6432_r_000000, for tracker > 'tracker_analytix3:localhost.localdomain/127.0.0.1:39980' > 2012-05-02 17:01:07,200 INFO org.apache.hadoop.mapred.TaskInProgress: > Error from attempt_201202240659_6432_r_000000_2: java.io.IOException: > The temporary job-output directory > hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary > doesn't exist! > at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250) > at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240) > at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116) > at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
-
Re: Reduce Hangs at 66%
Keith Thompson 2012-05-03, 23:33
I am not sure about ulimits, but I can answer the rest. It's a Cloudera distribution of Hadoop 0.20.2. The HDFS has 9 TB free. In the reduce step, I am taking keys in the form of (gridID, date), each with a value of 1. The reduce step just sums the 1's as the final output value for the key (It's counting how many people were in the gridID on a certain day).
I have been running other more complicated jobs with no problem, so I'm not sure why this one is being peculiar. This is the code I used to execute the program from the command line (the source is a file on the hdfs):
hadoop jar <jarfile> <driver> <source> /thompson/outputDensity/density1
The program then executes the map and gets to 66% of the reduce, then stops responding. The cause of the error seems to be:
Error from attempt_201202240659_6432_r_000000_1: java.io.IOException: > The temporary job-output directory > hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary > doesn't exist!
I don't understand what the _temporary is. I am assuming it's something Hadoop creates automatically.
On Thu, May 3, 2012 at 5:02 AM, Michel Segel <[EMAIL PROTECTED]>wrote:
> Well... > Lots of information but still missing some of the basics... > > Which release and version? > What are your ulimits set to? > How much free disk space do you have? > What are you attempting to do? > > Stuff like that. > > > > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On May 2, 2012, at 4:49 PM, Keith Thompson <[EMAIL PROTECTED]> > wrote: > > > I am running a task which gets to 66% of the Reduce step and then hangs > > indefinitely. Here is the log file (I apologize if I am putting too much > > here but I am not exactly sure what is relevant): > > > > 2012-05-02 16:42:52,975 INFO org.apache.hadoop.mapred.JobTracker: > > Adding task (REDUCE) 'attempt_201202240659_6433_r_000000_0' to tip > > task_201202240659_6433_r_000000, for tracker > > 'tracker_analytix7:localhost.localdomain/127.0.0.1:56515' > > 2012-05-02 16:42:53,584 INFO org.apache.hadoop.mapred.JobInProgress: > > Task 'attempt_201202240659_6433_m_000001_0' has completed > > task_201202240659_6433_m_000001 successfully. > > 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.TaskInProgress: > > Error from attempt_201202240659_6432_r_000000_0: Task > > attempt_201202240659_6432_r_000000_0 failed to report status for 1800 > > seconds. Killing! > > 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker: > > Removing task 'attempt_201202240659_6432_r_000000_0' > > 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker: > > Adding task (TASK_CLEANUP) 'attempt_201202240659_6432_r_000000_0' to > > tip task_201202240659_6432_r_000000, for tracker > > 'tracker_analytix4:localhost.localdomain/127.0.0.1:44204' > > 2012-05-02 17:00:48,763 INFO org.apache.hadoop.mapred.JobTracker: > > Removing task 'attempt_201202240659_6432_r_000000_0' > > 2012-05-02 17:00:48,957 INFO org.apache.hadoop.mapred.JobTracker: > > Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_1' to tip > > task_201202240659_6432_r_000000, for tracker > > 'tracker_analytix5:localhost.localdomain/127.0.0.1:59117' > > 2012-05-02 17:00:56,559 INFO org.apache.hadoop.mapred.TaskInProgress: > > Error from attempt_201202240659_6432_r_000000_1: java.io.IOException: > > The temporary job-output directory > > hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary > > doesn't exist! > > at > org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250) > > at > org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240) > > at > org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116) > > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438) > > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) > > at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > > at java.security.AccessController.doPrivileged(Native Method)
*Keith Thompson* Graduate Research Associate, Xerox Corporation SUNY Research Foundation Dept. of Systems Science and Industrial Engineering Binghamton University Binghamton, NY 13902
-
Re: Reduce Hangs at 66%
Raj Vishwanathan 2012-05-04, 00:03
Keith
What is the the output for ulimit -n? Your value for number of open files is probably too low.
Raj >________________________________ > From: Keith Thompson <[EMAIL PROTECTED]> >To: [EMAIL PROTECTED] >Sent: Thursday, May 3, 2012 4:33 PM >Subject: Re: Reduce Hangs at 66% > >I am not sure about ulimits, but I can answer the rest. It's a Cloudera >distribution of Hadoop 0.20.2. The HDFS has 9 TB free. In the reduce step, >I am taking keys in the form of (gridID, date), each with a value of 1. The >reduce step just sums the 1's as the final output value for the key (It's >counting how many people were in the gridID on a certain day). > >I have been running other more complicated jobs with no problem, so I'm not >sure why this one is being peculiar. This is the code I used to execute the >program from the command line (the source is a file on the hdfs): > >hadoop jar <jarfile> <driver> <source> /thompson/outputDensity/density1 > >The program then executes the map and gets to 66% of the reduce, then stops >responding. The cause of the error seems to be: > >Error from attempt_201202240659_6432_r_000000_1: java.io.IOException: >> The temporary job-output directory >> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary >> doesn't exist! > >I don't understand what the _temporary is. I am assuming it's something >Hadoop creates automatically. > > > >On Thu, May 3, 2012 at 5:02 AM, Michel Segel <[EMAIL PROTECTED]>wrote: > >> Well... >> Lots of information but still missing some of the basics... >> >> Which release and version? >> What are your ulimits set to? >> How much free disk space do you have? >> What are you attempting to do? >> >> Stuff like that. >> >> >> >> Sent from a remote device. Please excuse any typos... >> >> Mike Segel >> >> On May 2, 2012, at 4:49 PM, Keith Thompson <[EMAIL PROTECTED]> >> wrote: >> >> > I am running a task which gets to 66% of the Reduce step and then hangs >> > indefinitely. Here is the log file (I apologize if I am putting too much >> > here but I am not exactly sure what is relevant): >> > >> > 2012-05-02 16:42:52,975 INFO org.apache.hadoop.mapred.JobTracker: >> > Adding task (REDUCE) 'attempt_201202240659_6433_r_000000_0' to tip >> > task_201202240659_6433_r_000000, for tracker >> > 'tracker_analytix7:localhost.localdomain/127.0.0.1:56515' >> > 2012-05-02 16:42:53,584 INFO org.apache.hadoop.mapred.JobInProgress: >> > Task 'attempt_201202240659_6433_m_000001_0' has completed >> > task_201202240659_6433_m_000001 successfully. >> > 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.TaskInProgress: >> > Error from attempt_201202240659_6432_r_000000_0: Task >> > attempt_201202240659_6432_r_000000_0 failed to report status for 1800 >> > seconds. Killing! >> > 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker: >> > Removing task 'attempt_201202240659_6432_r_000000_0' >> > 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker: >> > Adding task (TASK_CLEANUP) 'attempt_201202240659_6432_r_000000_0' to >> > tip task_201202240659_6432_r_000000, for tracker >> > 'tracker_analytix4:localhost.localdomain/127.0.0.1:44204' >> > 2012-05-02 17:00:48,763 INFO org.apache.hadoop.mapred.JobTracker: >> > Removing task 'attempt_201202240659_6432_r_000000_0' >> > 2012-05-02 17:00:48,957 INFO org.apache.hadoop.mapred.JobTracker: >> > Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_1' to tip >> > task_201202240659_6432_r_000000, for tracker >> > 'tracker_analytix5:localhost.localdomain/127.0.0.1:59117' >> > 2012-05-02 17:00:56,559 INFO org.apache.hadoop.mapred.TaskInProgress: >> > Error from attempt_201202240659_6432_r_000000_1: java.io.IOException: >> > The temporary job-output directory >> > hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary >> > doesn't exist! >> > at >> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250) >> > at >> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
-
Re: Reduce Hangs at 66%
Michael Segel 2012-05-04, 15:22
Well That was one of the things I had asked. ulimit -a says it all.
But you have to do this for the users... hdfs, mapred, and hadoop
(Which is why I asked about which flavor.)
On May 3, 2012, at 7:03 PM, Raj Vishwanathan wrote:
> Keith > > What is the the output for ulimit -n? Your value for number of open files is probably too low. > > Raj > > > > >> ________________________________ >> From: Keith Thompson <[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED] >> Sent: Thursday, May 3, 2012 4:33 PM >> Subject: Re: Reduce Hangs at 66% >> >> I am not sure about ulimits, but I can answer the rest. It's a Cloudera >> distribution of Hadoop 0.20.2. The HDFS has 9 TB free. In the reduce step, >> I am taking keys in the form of (gridID, date), each with a value of 1. The >> reduce step just sums the 1's as the final output value for the key (It's >> counting how many people were in the gridID on a certain day). >> >> I have been running other more complicated jobs with no problem, so I'm not >> sure why this one is being peculiar. This is the code I used to execute the >> program from the command line (the source is a file on the hdfs): >> >> hadoop jar <jarfile> <driver> <source> /thompson/outputDensity/density1 >> >> The program then executes the map and gets to 66% of the reduce, then stops >> responding. The cause of the error seems to be: >> >> Error from attempt_201202240659_6432_r_000000_1: java.io.IOException: >>> The temporary job-output directory >>> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary >>> doesn't exist! >> >> I don't understand what the _temporary is. I am assuming it's something >> Hadoop creates automatically. >> >> >> >> On Thu, May 3, 2012 at 5:02 AM, Michel Segel <[EMAIL PROTECTED]>wrote: >> >>> Well... >>> Lots of information but still missing some of the basics... >>> >>> Which release and version? >>> What are your ulimits set to? >>> How much free disk space do you have? >>> What are you attempting to do? >>> >>> Stuff like that. >>> >>> >>> >>> Sent from a remote device. Please excuse any typos... >>> >>> Mike Segel >>> >>> On May 2, 2012, at 4:49 PM, Keith Thompson <[EMAIL PROTECTED]> >>> wrote: >>> >>>> I am running a task which gets to 66% of the Reduce step and then hangs >>>> indefinitely. Here is the log file (I apologize if I am putting too much >>>> here but I am not exactly sure what is relevant): >>>> >>>> 2012-05-02 16:42:52,975 INFO org.apache.hadoop.mapred.JobTracker: >>>> Adding task (REDUCE) 'attempt_201202240659_6433_r_000000_0' to tip >>>> task_201202240659_6433_r_000000, for tracker >>>> 'tracker_analytix7:localhost.localdomain/127.0.0.1:56515' >>>> 2012-05-02 16:42:53,584 INFO org.apache.hadoop.mapred.JobInProgress: >>>> Task 'attempt_201202240659_6433_m_000001_0' has completed >>>> task_201202240659_6433_m_000001 successfully. >>>> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.TaskInProgress: >>>> Error from attempt_201202240659_6432_r_000000_0: Task >>>> attempt_201202240659_6432_r_000000_0 failed to report status for 1800 >>>> seconds. Killing! >>>> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker: >>>> Removing task 'attempt_201202240659_6432_r_000000_0' >>>> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker: >>>> Adding task (TASK_CLEANUP) 'attempt_201202240659_6432_r_000000_0' to >>>> tip task_201202240659_6432_r_000000, for tracker >>>> 'tracker_analytix4:localhost.localdomain/127.0.0.1:44204' >>>> 2012-05-02 17:00:48,763 INFO org.apache.hadoop.mapred.JobTracker: >>>> Removing task 'attempt_201202240659_6432_r_000000_0' >>>> 2012-05-02 17:00:48,957 INFO org.apache.hadoop.mapred.JobTracker: >>>> Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_1' to tip >>>> task_201202240659_6432_r_000000, for tracker >>>> 'tracker_analytix5:localhost.localdomain/127.0.0.1:59117' >>>> 2012-05-02 17:00:56,559 INFO org.apache.hadoop.mapred.TaskInProgress: >>>> Error from attempt_201202240659_6432_r_000000_1: java.io.IOException:
-
Re: Reduce Hangs at 66%
Keith Thompson 2012-05-04, 16:52
Thanks everyone for your help. It is running fine now. On Fri, May 4, 2012 at 11:22 AM, Michael Segel <[EMAIL PROTECTED]>wrote:
> Well > That was one of the things I had asked. > ulimit -a says it all. > > But you have to do this for the users... hdfs, mapred, and hadoop > > (Which is why I asked about which flavor.) > > On May 3, 2012, at 7:03 PM, Raj Vishwanathan wrote: > > > Keith > > > > What is the the output for ulimit -n? Your value for number of open > files is probably too low. > > > > Raj > > > > > > > > > >> ________________________________ > >> From: Keith Thompson <[EMAIL PROTECTED]> > >> To: [EMAIL PROTECTED] > >> Sent: Thursday, May 3, 2012 4:33 PM > >> Subject: Re: Reduce Hangs at 66% > >> > >> I am not sure about ulimits, but I can answer the rest. It's a Cloudera > >> distribution of Hadoop 0.20.2. The HDFS has 9 TB free. In the reduce > step, > >> I am taking keys in the form of (gridID, date), each with a value of 1. > The > >> reduce step just sums the 1's as the final output value for the key > (It's > >> counting how many people were in the gridID on a certain day). > >> > >> I have been running other more complicated jobs with no problem, so I'm > not > >> sure why this one is being peculiar. This is the code I used to execute > the > >> program from the command line (the source is a file on the hdfs): > >> > >> hadoop jar <jarfile> <driver> <source> /thompson/outputDensity/density1 > >> > >> The program then executes the map and gets to 66% of the reduce, then > stops > >> responding. The cause of the error seems to be: > >> > >> Error from attempt_201202240659_6432_r_000000_1: java.io.IOException: > >>> The temporary job-output directory > >>> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary > >>> doesn't exist! > >> > >> I don't understand what the _temporary is. I am assuming it's something > >> Hadoop creates automatically. > >> > >> > >> > >> On Thu, May 3, 2012 at 5:02 AM, Michel Segel <[EMAIL PROTECTED] > >wrote: > >> > >>> Well... > >>> Lots of information but still missing some of the basics... > >>> > >>> Which release and version? > >>> What are your ulimits set to? > >>> How much free disk space do you have? > >>> What are you attempting to do? > >>> > >>> Stuff like that. > >>> > >>> > >>> > >>> Sent from a remote device. Please excuse any typos... > >>> > >>> Mike Segel > >>> > >>> On May 2, 2012, at 4:49 PM, Keith Thompson <[EMAIL PROTECTED]> > >>> wrote: > >>> > >>>> I am running a task which gets to 66% of the Reduce step and then > hangs > >>>> indefinitely. Here is the log file (I apologize if I am putting too > much > >>>> here but I am not exactly sure what is relevant): > >>>> > >>>> 2012-05-02 16:42:52,975 INFO org.apache.hadoop.mapred.JobTracker: > >>>> Adding task (REDUCE) 'attempt_201202240659_6433_r_000000_0' to tip > >>>> task_201202240659_6433_r_000000, for tracker > >>>> 'tracker_analytix7:localhost.localdomain/127.0.0.1:56515' > >>>> 2012-05-02 16:42:53,584 INFO org.apache.hadoop.mapred.JobInProgress: > >>>> Task 'attempt_201202240659_6433_m_000001_0' has completed > >>>> task_201202240659_6433_m_000001 successfully. > >>>> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.TaskInProgress: > >>>> Error from attempt_201202240659_6432_r_000000_0: Task > >>>> attempt_201202240659_6432_r_000000_0 failed to report status for 1800 > >>>> seconds. Killing! > >>>> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker: > >>>> Removing task 'attempt_201202240659_6432_r_000000_0' > >>>> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker: > >>>> Adding task (TASK_CLEANUP) 'attempt_201202240659_6432_r_000000_0' to > >>>> tip task_201202240659_6432_r_000000, for tracker > >>>> 'tracker_analytix4:localhost.localdomain/127.0.0.1:44204' > >>>> 2012-05-02 17:00:48,763 INFO org.apache.hadoop.mapred.JobTracker: > >>>> Removing task 'attempt_201202240659_6432_r_000000_0' > >>>> 2012-05-02 17:00:48,957 INFO org.apache.hadoop.mapred.JobTracker: *Keith Thompson* Graduate Research Associate, Xerox Corporation SUNY Research Foundation Dept. of Systems Science and Industrial Engineering Binghamton University Binghamton, NY 13902
|
|