-Re: Question about Skip Bad Records
Harsh J 2013-06-15, 14:51
Please see comments in https://issues.apache.org/jira/browse/MAPREDUCE-1932
On Sat, Jun 15, 2013 at 12:09 PM, 小强 <[EMAIL PROTECTED]> wrote:
> Hi, I found the SkippingRecordReader is no longer supported in the new api
> and I am curious about the reason, can anyone tell me.
> Besides, when I look into the old api and try to figure out what skip mode
> was doing, I am a little confused about the logic there.
> In my comprehension, if java api is used we can always precisely locate
> which one is the bad record.
> If streaming is used, as long as user can handle the counter correctly (I
> mean accumulate the counter for each record in), we can also locate the
> exact bad record. (I wonder if I miss something here)
> But if user don't care about the counter it's always a disaster for the
> framework to locate bad records (even using binary search)
> To sum up:
> Ques 1: why skip mode is removed in the new api
> Ques 2: if user handle counter correctly in streaming, can we locate the
> exact bad record
> Ques 3: when in skip mode, why not locate more bad records by restart the
> user logic instead of locate one bad record for each task attempt
> Thank you!
> Dasheng Jiang