|
|
-
Fate of skipping-bad-records feature (MAPREDUCE-4840)
Mostafa Elhemali 2012-12-03, 20:26
(Taking the discussion out of the JIRA so I can tap into the historical and other knowledge of the group)
Hi all, I was looking through MR code in trunk (always a fun weekend activity) and was puzzled by the loose ends around the feature to skip bad records; by loose ends I mean there's a lot of code in there but it can't really work. I dug through JIRA's and found MAPREDUCE-1932 that I thought implied that the feature is now intentionally dead, so I filed MAPREDUCE-4840 to delete dead code and deprecate API's. In that new JIRA however Hash corrected me, pointing out the MAPREDUCE-1932 only applied to the new API. So my questions are:
1. Do we want to support the skip-bad-records feature for the old API in trunk? Personally I think it's a bit weird to tie this feature to which API you use since the feature is configured by config file, not as part of the API, but I don't have a strong opinion either way. 2. Is there a JIRA/other work tracking enabling this feature in Yarn/trunk? There are "Not yet implemented" exceptions being thrown in the code that makes me think someone is aware of that and there's a plan to fix, so I'm wondering where that is tracked. Thanks, Mostafa
-
Re: Fate of skipping-bad-records feature (MAPREDUCE-4840)
Harsh J 2012-12-03, 21:31
Hi,
On Tue, Dec 4, 2012 at 1:56 AM, Mostafa Elhemali <[EMAIL PROTECTED]> wrote: > (Taking the discussion out of the JIRA so I can tap into the historical and > other knowledge of the group) > > Hi all, > I was looking through MR code in trunk (always a fun weekend activity) and > was puzzled by the loose ends around the feature to skip bad records; by > loose ends I mean there's a lot of code in there but it can't really work. > I dug through JIRA's and found MAPREDUCE-1932 that I thought implied that > the feature is now intentionally dead, so I filed MAPREDUCE-4840 to delete > dead code and deprecate API's. In that new JIRA however Hash corrected me, > pointing out the MAPREDUCE-1932 only applied to the new API. So my > questions are: > > 1. Do we want to support the skip-bad-records feature for the old API in > trunk? Personally I think it's a bit weird to tie this feature to which API > you use since the feature is configured by config file, not as part of the > API, but I don't have a strong opinion either way.
I personally think we should not support it anymore.
There are some hard-bindings to this feature set right into MR runtime classes, which is why its not been trivial to get it done in the new API as well.
> 2. Is there a JIRA/other work tracking enabling this feature in Yarn/trunk? > There are "Not yet implemented" exceptions being thrown in the code that > makes me think someone is aware of that and there's a plan to fix, so I'm > wondering where that is tracked.
I'm not aware of anyone working on this. I did attempt it myself once, pre-MR2 days, but we wound up deciding it is unsuitable to support this ourselves.
-- Harsh J
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext