felix gao 2010-12-20, 20:06
-Re: How to record the bad records encountered by hadoop
Hemanth Yamijala 2010-12-26, 07:46
On Tue, Dec 21, 2010 at 1:36 AM, felix gao <[EMAIL PROTECTED]> wrote:
> Not sure if this is the right mailing list of this question. I am using pig
> to do some data analysis and I am wondering if there a way to tell hadoop
> when it encountered a bad log files either due to uncompression failures or
> what ever caused the job to die, record the line and if possible the
> filename it is working on in the some logs so I can go back to take a look
> at it later?
If you set the option "mapreduce.task.files.preserve.failedtasks", the
input files that caused the failure would be preserved. You could then
take that sample and possibly run it on just that subset of files to
understand more about the failure. Would this help ?
BTW, the name of the variable changed in 0.21. So, if you are using an
older version, it might be called something a little different. You
could look at the mapred-default.xml file in the source.