Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Long running Join Query - Reduce task fails due to failing to report status


Copy link to this message
-
Re: Long running Join Query - Reduce task fails due to failing to report status
Bertrand Dechoux 2012-08-24, 18:11
It is not clear from your post but your job is always failing during the
same step? Or only sometimes? Or only once?
Since it's a hive query I would modify it to find the root cause.

First create temporary "files" which are the results from the three first
M/R.
Then run the fourth M/R on it and try to filter the data in order to see if
it is related to the volume or the format.

Regards

Bertrand

On Fri, Aug 24, 2012 at 7:44 PM, Igor Tatarinov <[EMAIL PROTECTED]> wrote:

> Why don't you try splitting the big query into smaller ones?
>
>
> On Fri, Aug 24, 2012 at 10:20 AM, Tim Havens <[EMAIL PROTECTED]> wrote:
>
>>
>> Just curious if you've tried using Hive's explain method to see what IT
>> thinks of your query.
>>
>>
>> On Fri, Aug 24, 2012 at 9:36 AM, Himanish Kushary <[EMAIL PROTECTED]>wrote:
>>
>>> Hi,
>>>
>>> We have a complex query that involves several left outer joins resulting
>>> in 8 M/R jobs in Hive.During execution of one of the stages ( after three
>>> M/R has run) the M/R job fails due to few Reduce tasks failing due to
>>> inactivity.
>>>
>>> Most of the reduce tasks go through fine ( within 3 mins) but the last
>>> one gets stuck for a long time (> 1 hour) and finally after several
>>> attempts gets killed due to "failed to report status for 600 seconds.
>>> Killing!"
>>>
>>> What may be causing this issue ? Would hive.script.auto.progress help in
>>> this case ? As we are not able to get much information from the log files
>>> how may we approach resolving this ? Will tweaking of any specific M/R
>>> parameters help ?
>>>
>>> The task attempt log shows several lines like this before exiting :
>>>
>>> 2012-08-23 19:17:23,848 INFO ExecReducer: ExecReducer: processing 219000000 rows: used memory = 408582240
>>> 2012-08-23 19:17:30,189 INFO ExecReducer: ExecReducer: processing 220000000 rows: used memory = 346110400
>>> 2012-08-23 19:17:37,510 INFO ExecReducer: ExecReducer: processing 221000000 rows: used memory = 583913576
>>> 2012-08-23 19:17:44,829 INFO ExecReducer: ExecReducer: processing 222000000 rows: used memory = 513071504
>>> 2012-08-23 19:17:47,923 INFO org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 1
>>>
>>> Here are the reduce task counters:
>>>
>>> *Map-Reduce Framework* Combine input records0 Combine output records0Reduce input groups
>>> 222,480,335 Reduce shuffle bytes7,726,141,897 Reduce input records
>>> 222,480,335 Reduce output records0 Spilled Records355,827,191 CPU time
>>> spent (ms)2,152,160 Physical memory (bytes) snapshot1,182,490,624Virtual memory (bytes) snapshot
>>> 1,694,531,584 Total committed heap usage (bytes)990,052,352
>>>
>>> The tasktracker log gives a thread dump at that time but no exception.
>>>
>>> *2012-08-23 20:05:49,319 INFO org.apache.hadoop.mapred.TaskTracker:
>>> Process Thread Dump: lost task*
>>> *69 active threads*
>>>
>>> ---------------------------
>>> Thanks & Regards
>>> Himanish
>>>
>>
>>
>>
>> --
>> "The whole world is you. Yet you keep thinking there is something else."
>> - Xuefeng Yicun 822-902 A.D.
>>
>> Tim R. Havens
>> Google Phone: 573.454.1232
>> ICQ: 495992798
>> ICBM:  37°51'34.79"N   90°35'24.35"W
>> ham radio callsign: NW0W
>>
>
>
--
Bertrand Dechoux