Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Question about Skip Bad Records

Copy link to this message
Question about Skip Bad Records
Hi, I found the SkippingRecordReader is no longer supported in the new api and I am curious about the reason, can anyone tell me.
Besides, when I look into the old api and try to figure out what skip mode was doing, I am a little confused about the logic there.
In my comprehension, if java api is used we can always precisely locate which one is the bad record.
If streaming is used, as long as user can handle the counter correctly (I mean accumulate the counter for each record in), we can also locate the exact bad record. (I wonder if I miss something here)
But if user don't care about the counter it's always a disaster for the framework to locate bad records (even using binary search)
To sum up:
Ques 1:  why skip mode is removed in the new api
Ques 2:  if user handle counter correctly in streaming, can we locate the exact bad record
Ques 3:  when in skip mode, why not locate more bad records by restart the user logic instead of locate one bad record for each task attempt
Thank you!
Dasheng Jiang