We set a part of the failure reason as the diagnostic message for a
failed task that a JobClient API retrieves/can retrieve:
Often this is
'useless' given the stack trace's top part isn't always carrying the
most relevant information, so perhaps HADOOP-9861 may help here once
it is checked in.
On Tue, Aug 27, 2013 at 10:34 AM, Gopi Krishna M <[EMAIL PROTECTED]> wrote:
> We are seeing our map-reduce jobs crashing once in a while and have to go
> through the logs on all the nodes to figure out what went wrong. Sometimes
> it is low resources and sometimes it is a programming error which is
> triggered on specific inputs.. Same is true for some of our hive queries.
> Are there any tools (free/paid) which help us to do this debugging quickly?
> I am planning to write a debugging tool for sifting through the distributed
> logs of hadoop but wanted to check if there are already any useful tools for
> Gopi | www.wignite.com