Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Question about Skip Bad Records

Copy link to this message
Re: Question about Skip Bad Records

Please see comments in https://issues.apache.org/jira/browse/MAPREDUCE-1932

On Sat, Jun 15, 2013 at 12:09 PM, 小强 <[EMAIL PROTECTED]> wrote:
> Hi, I found the SkippingRecordReader is no longer supported in the new api
> and I am curious about the reason, can anyone tell me.
> Besides, when I look into the old api and try to figure out what skip mode
> was doing, I am a little confused about the logic there.
> In my comprehension, if java api is used we can always precisely locate
> which one is the bad record.
> If streaming is used, as long as user can handle the counter correctly (I
> mean accumulate the counter for each record in), we can also locate the
> exact bad record. (I wonder if I miss something here)
> But if user don't care about the counter it's always a disaster for the
> framework to locate bad records (even using binary search)
> To sum up:
> Ques 1:  why skip mode is removed in the new api
> Ques 2:  if user handle counter correctly in streaming, can we locate the
> exact bad record
> Ques 3:  when in skip mode, why not locate more bad records by restart the
> user logic instead of locate one bad record for each task attempt
> Thank you!
> Dasheng Jiang

Harsh J