Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> How to record the bad records encountered by hadoop

Copy link to this message
Re: How to record the bad records encountered by hadoop

On Tue, Dec 21, 2010 at 1:36 AM, felix gao <[EMAIL PROTECTED]> wrote:
> All,
> Not sure if this is the right mailing list of this question. I am using pig
> to do some data analysis and I am wondering if there a way to tell hadoop
> when it encountered a bad log files either due to uncompression failures or
> what ever caused the job to die, record the line and if possible the
> filename it is working on in the some logs so I can go back to take a look
> at it later?

If you set the option "mapreduce.task.files.preserve.failedtasks", the
input files that caused the failure would be preserved. You could then
take that sample and possibly run it on just that subset of files to
understand more about the failure. Would this help ?

BTW, the name of the variable changed in 0.21. So, if you are using an
older version, it might be called something a little different. You
could look at the mapred-default.xml file in the source.

> Thanks,
> Felix