|
|
-
what does it mean when a job fails at 100%?
Mike Kendall 2009-11-13, 22:02
title says it all.. this isn't the first job i've written either. very confused.
-
Re: what does it mean when a job fails at 100%?
Edmund Kohlwey 2009-11-13, 22:15
Lots of things can happen. If you have a cleanup method, that can fail after map and reduce complete. Also, hadoop writes the output of a task to local disk, and only commits the results of the individual tasks to HDFS after they complete, so you might be failing on the copy to HDFS.
On 11/13/09 5:02 PM, Mike Kendall wrote: > title says it all.. this isn't the first job i've written either. very > confused. > >
-
Re: what does it mean when a job fails at 100%?
brien colwell 2009-11-13, 22:15
It could be that the result can't be written to HDFS. Is there any hint in the log? I recently encountered this behavior when writing many files back. Mike Kendall wrote: > title says it all.. this isn't the first job i've written either. very > confused. > >
-
Re: what does it mean when a job fails at 100%?
Ashutosh Chauhan 2009-11-13, 22:16
Hi Mike,
This % reported represents % of records read by framework not % of records processed. So, for sake of example lets say you only have one record in the data, framework will report 100% as soon as it is read even though you might be doing lot of processing on that record and that processing is still going on. Second, there can be floating point errors here so e.g., after reading 9991 records out of total 10000 for the split, counter will say 100% while some records are still untouched. Lastly, if you are using close() method, your task might be failing there and framework will report 100% before that. I am not expert on counters, so you may to hear from others before believing what I am saying :)
Thanks, Ashutosh
On Fri, Nov 13, 2009 at 17:15, brien colwell <[EMAIL PROTECTED]> wrote:
> It could be that the result can't be written to HDFS. Is there any hint in > the log? I recently encountered this behavior when writing many files back. > > > > Mike Kendall wrote: > >> title says it all.. this isn't the first job i've written either. very >> confused. >> >> >> > >
-
Re: what does it mean when a job fails at 100%?
Mike Kendall 2009-11-13, 23:03
Hmm.. let's collect some error messages. looks like the same task failed 4 times... is there a way that i can get better logs about this task?
MapAttempt TASK_TYPE="MAP" TASKID="task_200911131440_0001_m_000307" TASK_ATTEMPT_ID="attempt_200911131440_0001_m_000307_0" TASK_STATUS="FAILED" FINISH_TIME="1258123661159" HOSTNAME="hadoop3\.justin\.tv" ERROR="java\.lang\.RuntimeException: PipeMapRed\.waitOutputThreads(): subprocess failed with code 1 at org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311)
MapAttempt TASK_TYPE="MAP" TASKID="task_200911131440_0001_m_000307" TASK_ATTEMPT_ID="attempt_200911131440_0001_m_000307_1" TASK_STATUS="FAILED" FINISH_TIME="1258123852424" HOSTNAME="hadoop1\.justin\.tv" ERROR="java\.lang\.RuntimeException: PipeMapRed\.waitOutputThreads(): subprocess failed with code 1 at org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311)
MapAttempt TASK_TYPE="MAP" TASKID="task_200911131440_0001_m_000307" TASK_ATTEMPT_ID="attempt_200911131440_0001_m_000307_2" TASK_STATUS="FAILED" FINISH_TIME="1258123725938" HOSTNAME="hadoop4\.justin\.tv" ERROR="java\.lang\.RuntimeException: PipeMapRed\.waitOutputThreads(): subprocess failed with code 1 at org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311) MapAttempt TASK_TYPE="MAP" TASKID="task_200911131440_0001_m_000307" TASK_ATTEMPT_ID="attempt_200911131440_0001_m_000307_3" TASK_STATUS="FAILED" FINISH_T IME="1258123756980" HOSTNAME="hadoop2\.justin\.tv" ERROR="java\.lang\.RuntimeException: PipeMapRed\.waitOutputThreads(): subprocess failed with code 1 at org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311)
Task TASKID="task_200911131440_0001_m_000307" TASK_TYPE="MAP" TASK_STATUS="FAILED" FINISH_TIME="1258123756980" ERROR="java\.lang\.RuntimeException: Pipe MapRed\.waitOutputThreads(): subprocess failed with code 1 at org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311)
On Fri, Nov 13, 2009 at 2:16 PM, Ashutosh Chauhan < [EMAIL PROTECTED]> wrote:
> Hi Mike, > > This % reported represents % of records read by framework not % of records > processed. So, for sake of example lets say you only have one record in the > data, framework will report 100% as soon as it is read even though you > might > be doing lot of processing on that record and that processing is still > going > on. Second, there can be floating point errors here so e.g., after reading > 9991 records out of total 10000 for the split, counter will say 100% while > some records are still untouched. Lastly, if you are using close() method, > your task might be failing there and framework will report 100% before > that. > I am not expert on counters, so you may to hear from others before > believing > what I am saying :) > > Thanks, > Ashutosh > > On Fri, Nov 13, 2009 at 17:15, brien colwell <[EMAIL PROTECTED]> wrote: > > > It could be that the result can't be written to HDFS. Is there any hint > in > > the log? I recently encountered this behavior when writing many files > back. > > > > > > > > Mike Kendall wrote: > > > >> title says it all.. this isn't the first job i've written either. very > >> confused. > >> > >> > >> > > > > >
-
Re: what does it mean when a job fails at 100%?
Mike Kendall 2009-11-13, 23:19
oh and just fyi this is the only failed task. everything else works just fine. maybe the data copied over incorrectly or was malformed... /me checks
On Fri, Nov 13, 2009 at 3:03 PM, Mike Kendall <[EMAIL PROTECTED]> wrote:
> Hmm.. let's collect some error messages. looks like the same task failed > 4 times... is there a way that i can get better logs about this task? > > MapAttempt TASK_TYPE="MAP" TASKID="task_200911131440_0001_m_000307" > TASK_ATTEMPT_ID="attempt_200911131440_0001_m_000307_0" TASK_STATUS="FAILED" > FINISH_TIME="1258123661159" HOSTNAME="hadoop3\.justin\.tv" > ERROR="java\.lang\.RuntimeException: PipeMapRed\.waitOutputThreads(): > subprocess failed with code 1 > at > org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311) > > MapAttempt TASK_TYPE="MAP" TASKID="task_200911131440_0001_m_000307" > TASK_ATTEMPT_ID="attempt_200911131440_0001_m_000307_1" TASK_STATUS="FAILED" > FINISH_TIME="1258123852424" HOSTNAME="hadoop1\.justin\.tv" > ERROR="java\.lang\.RuntimeException: PipeMapRed\.waitOutputThreads(): > subprocess failed with code 1 > at > org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311) > > MapAttempt TASK_TYPE="MAP" TASKID="task_200911131440_0001_m_000307" > TASK_ATTEMPT_ID="attempt_200911131440_0001_m_000307_2" TASK_STATUS="FAILED" > FINISH_TIME="1258123725938" HOSTNAME="hadoop4\.justin\.tv" > ERROR="java\.lang\.RuntimeException: PipeMapRed\.waitOutputThreads(): > subprocess failed with code 1 at > org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311) > > > MapAttempt TASK_TYPE="MAP" TASKID="task_200911131440_0001_m_000307" > TASK_ATTEMPT_ID="attempt_200911131440_0001_m_000307_3" TASK_STATUS="FAILED" > FINISH_T > IME="1258123756980" HOSTNAME="hadoop2\.justin\.tv" > ERROR="java\.lang\.RuntimeException: PipeMapRed\.waitOutputThreads(): > subprocess failed with code 1 > at > org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311) > > Task TASKID="task_200911131440_0001_m_000307" TASK_TYPE="MAP" > TASK_STATUS="FAILED" FINISH_TIME="1258123756980" > ERROR="java\.lang\.RuntimeException: Pipe > MapRed\.waitOutputThreads(): subprocess failed with code 1 at > org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311) > > > On Fri, Nov 13, 2009 at 2:16 PM, Ashutosh Chauhan < > [EMAIL PROTECTED]> wrote: > >> Hi Mike, >> >> This % reported represents % of records read by framework not % of records >> processed. So, for sake of example lets say you only have one record in >> the >> data, framework will report 100% as soon as it is read even though you >> might >> be doing lot of processing on that record and that processing is still >> going >> on. Second, there can be floating point errors here so e.g., after reading >> 9991 records out of total 10000 for the split, counter will say 100% while >> some records are still untouched. Lastly, if you are using close() method, >> your task might be failing there and framework will report 100% before >> that. >> I am not expert on counters, so you may to hear from others before >> believing >> what I am saying :) >> >> Thanks, >> Ashutosh >> >> On Fri, Nov 13, 2009 at 17:15, brien colwell <[EMAIL PROTECTED]> wrote: >> >> > It could be that the result can't be written to HDFS. Is there any hint >> in >> > the log? I recently encountered this behavior when writing many files >> back. >> > >> > >> > >> > Mike Kendall wrote: >> > >> >> title says it all.. this isn't the first job i've written either. >> very >> >> confused. >> >> >> >> >> >> >> > >> > >> > >
|
|