Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Re: I am trying to run a large job and it is consistently failing with timeout - nothing happens for 600 sec


Copy link to this message
-
Re: I am trying to run a large job and it is consistently failing with timeout - nothing happens for 600 sec
Michael Segel 2012-01-19, 03:08
But Steve, it is your code... :-)

Here is a simple test...

Set your code up where the run fails...

Add a simple timer to see how long you spend in the Mapper.map() method.

only print out the time if its greater than lets say 500 seconds...

The other thing is to update a dynamic counter in Mapper.map().
This would force a status update to be sent to the JT.

Also you dont give a lot of detail...
Are you writing out to an HBase table???

HTH

-Mike

On Jan 18, 2012, at 6:21 PM, Steve Lewis wrote:

> 1) I do a lot of progress reporting
> 2) Why would the job succeed when the only change in the code is
>      if(NumberWrites++ % 100 == 0)
>              context.write(key,value);
> comment out the test  allowing full writes and the job fails
> Since every write is a report I assume that something in the write code or
> other hadoop code for dealing with output if failing. I do increment a
> counter for every write or in the case of the above code potential write
> What I am seeing is that where ever the timeout occurs it is not in a place
> where I am capable of inserting more reporting
>
>
>
> On Wed, Jan 18, 2012 at 4:01 PM, Leonardo Urbina <[EMAIL PROTECTED]> wrote:
>
>> Perhaps you are not reporting progress throughout your task. If you
>> happen to run a job large enough job you hit the the default timeout
>> mapred.task.timeout  (that defaults to 10 min). Perhaps you should
>> consider reporting progress in your mapper/reducer by calling
>> progress() on the Reporter object. Check tip 7 of this link:
>>
>> http://www.cloudera.com/blog/2009/05/10-mapreduce-tips/
>>
>> Hope that helps,
>> -Leo
>>
>> Sent from my phone
>>
>> On Jan 18, 2012, at 6:46 PM, Steve Lewis <[EMAIL PROTECTED]> wrote:
>>
>>> I KNOW is is a task timeout - what I do NOT know is WHY merely cutting
>> the
>>> number of writes causes it to go away. It seems to imply that some
>>> context.write operation or something downstream from that is taking a
>> huge
>>> amount of time and that is all hadoop internal code - not mine so my
>>> question is why should increasing the number and volume of wriotes cause
>> a
>>> task to time out
>>>
>>> On Wed, Jan 18, 2012 at 2:33 PM, Tom Melendez <[EMAIL PROTECTED]> wrote:
>>>
>>>> Sounds like mapred.task.timeout?  The default is 10 minutes.
>>>>
>>>> http://hadoop.apache.org/common/docs/current/mapred-default.html
>>>>
>>>> Thanks,
>>>>
>>>> Tom
>>>>
>>>> On Wed, Jan 18, 2012 at 2:05 PM, Steve Lewis <[EMAIL PROTECTED]>
>>>> wrote:
>>>>> The map tasks fail timing out after 600 sec.
>>>>> I am processing one 9 GB file with 16,000,000 records. Each record
>> (think
>>>>> is it as a line)  generates hundreds of key value pairs.
>>>>> The job is unusual in that the output of the mapper in terms of records
>>>> or
>>>>> bytes orders of magnitude larger than the input.
>>>>> I have no idea what is slowing down the job except that the problem is
>> in
>>>>> the writes.
>>>>>
>>>>> If I change the job to merely bypass a fraction of the context.write
>>>>> statements the job succeeds.
>>>>> This is one map task that failed and one that succeeded - I cannot
>>>>> understand how a write can take so long
>>>>> or what else the mapper might be doing
>>>>>
>>>>> JOB FAILED WITH TIMEOUT
>>>>>
>>>>> *Parser*TotalProteins90,103NumberFragments10,933,089
>>>>>
>>>>
>> *FileSystemCounters*HDFS_BYTES_READ67,245,605FILE_BYTES_WRITTEN444,054,807
>>>>> *Map-Reduce Framework*Combine output records10,033,499Map input records
>>>>> 90,103Spilled Records10,032,836Map output bytes3,520,182,794Combine
>> input
>>>>> records10,844,881Map output records10,933,089
>>>>> Same code but fewer writes
>>>>> JOB SUCCEEDED
>>>>>
>>>>> *Parser*TotalProteins90,103NumberFragments206,658,758
>>>>> *FileSystemCounters*FILE_BYTES_READ111,578,253HDFS_BYTES_READ67,245,607
>>>>> FILE_BYTES_WRITTEN220,169,922
>>>>> *Map-Reduce Framework*Combine output records4,046,128Map input
>>>>> records90,103Spilled
>>>>> Records4,046,128Map output bytes662,354,413Combine input