Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Reduce Hangs at 66%


Copy link to this message
-
Re: Reduce Hangs at 66%
Michael Segel 2012-05-04, 15:22
Well
That was one of the things I had asked.
ulimit -a says it all.

But you have to do this for the users... hdfs, mapred, and hadoop

(Which is why I asked about which flavor.)

On May 3, 2012, at 7:03 PM, Raj Vishwanathan wrote:

> Keith
>
> What is the the output for ulimit -n? Your value for number of open files is probably too low.
>
> Raj
>
>
>
>
>> ________________________________
>> From: Keith Thompson <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]
>> Sent: Thursday, May 3, 2012 4:33 PM
>> Subject: Re: Reduce Hangs at 66%
>>
>> I am not sure about ulimits, but I can answer the rest. It's a Cloudera
>> distribution of Hadoop 0.20.2. The HDFS has 9 TB free. In the reduce step,
>> I am taking keys in the form of (gridID, date), each with a value of 1. The
>> reduce step just sums the 1's as the final output value for the key (It's
>> counting how many people were in the gridID on a certain day).
>>
>> I have been running other more complicated jobs with no problem, so I'm not
>> sure why this one is being peculiar. This is the code I used to execute the
>> program from the command line (the source is a file on the hdfs):
>>
>> hadoop jar <jarfile> <driver> <source> /thompson/outputDensity/density1
>>
>> The program then executes the map and gets to 66% of the reduce, then stops
>> responding. The cause of the error seems to be:
>>
>> Error from attempt_201202240659_6432_r_000000_1: java.io.IOException:
>>> The temporary job-output directory
>>> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
>>> doesn't exist!
>>
>> I don't understand what the _temporary is. I am assuming it's something
>> Hadoop creates automatically.
>>
>>
>>
>> On Thu, May 3, 2012 at 5:02 AM, Michel Segel <[EMAIL PROTECTED]>wrote:
>>
>>> Well...
>>> Lots of information but still missing some of the basics...
>>>
>>> Which release and version?
>>> What are your ulimits set to?
>>> How much free disk space do you have?
>>> What are you attempting to do?
>>>
>>> Stuff like that.
>>>
>>>
>>>
>>> Sent from a remote device. Please excuse any typos...
>>>
>>> Mike Segel
>>>
>>> On May 2, 2012, at 4:49 PM, Keith Thompson <[EMAIL PROTECTED]>
>>> wrote:
>>>
>>>> I am running a task which gets to 66% of the Reduce step and then hangs
>>>> indefinitely. Here is the log file (I apologize if I am putting too much
>>>> here but I am not exactly sure what is relevant):
>>>>
>>>> 2012-05-02 16:42:52,975 INFO org.apache.hadoop.mapred.JobTracker:
>>>> Adding task (REDUCE) 'attempt_201202240659_6433_r_000000_0' to tip
>>>> task_201202240659_6433_r_000000, for tracker
>>>> 'tracker_analytix7:localhost.localdomain/127.0.0.1:56515'
>>>> 2012-05-02 16:42:53,584 INFO org.apache.hadoop.mapred.JobInProgress:
>>>> Task 'attempt_201202240659_6433_m_000001_0' has completed
>>>> task_201202240659_6433_m_000001 successfully.
>>>> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.TaskInProgress:
>>>> Error from attempt_201202240659_6432_r_000000_0: Task
>>>> attempt_201202240659_6432_r_000000_0 failed to report status for 1800
>>>> seconds. Killing!
>>>> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
>>>> Removing task 'attempt_201202240659_6432_r_000000_0'
>>>> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
>>>> Adding task (TASK_CLEANUP) 'attempt_201202240659_6432_r_000000_0' to
>>>> tip task_201202240659_6432_r_000000, for tracker
>>>> 'tracker_analytix4:localhost.localdomain/127.0.0.1:44204'
>>>> 2012-05-02 17:00:48,763 INFO org.apache.hadoop.mapred.JobTracker:
>>>> Removing task 'attempt_201202240659_6432_r_000000_0'
>>>> 2012-05-02 17:00:48,957 INFO org.apache.hadoop.mapred.JobTracker:
>>>> Adding task (REDUCE) 'attempt_201202240659_6432_r_000000_1' to tip
>>>> task_201202240659_6432_r_000000, for tracker
>>>> 'tracker_analytix5:localhost.localdomain/127.0.0.1:59117'
>>>> 2012-05-02 17:00:56,559 INFO org.apache.hadoop.mapred.TaskInProgress:
>>>> Error from attempt_201202240659_6432_r_000000_1: java.io.IOException: