Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # dev >> Fate of skipping-bad-records feature (MAPREDUCE-4840)


Copy link to this message
-
Re: Fate of skipping-bad-records feature (MAPREDUCE-4840)
Hi,

On Tue, Dec 4, 2012 at 1:56 AM, Mostafa Elhemali
<[EMAIL PROTECTED]> wrote:
> (Taking the discussion out of the JIRA so I can tap into the historical and
> other knowledge of the group)
>
> Hi all,
> I was looking through MR code in trunk (always a fun weekend activity) and
> was puzzled by the loose ends around the feature to skip bad records; by
> loose ends I mean there's a lot of code in there but it can't really work.
> I dug through JIRA's and found MAPREDUCE-1932 that I thought implied that
> the feature is now intentionally dead, so I filed MAPREDUCE-4840 to delete
> dead code and deprecate API's. In that new JIRA however Hash corrected me,
> pointing out the MAPREDUCE-1932 only applied to the new API. So my
> questions are:
>
> 1. Do we want to support the skip-bad-records feature for the old API in
> trunk? Personally I think it's a bit weird to tie this feature to which API
> you use since the feature is configured by config file, not as part of the
> API, but I don't have a strong opinion either way.

I personally think we should not support it anymore.

There are some hard-bindings to this feature set right into MR runtime
classes, which is why its not been trivial to get it done in the new
API as well.

> 2. Is there a JIRA/other work tracking enabling this feature in Yarn/trunk?
> There are "Not yet implemented" exceptions being thrown in the code that
> makes me think someone is aware of that and there's a plan to fix, so I'm
> wondering where that is tracked.

I'm not aware of anyone working on this. I did attempt it myself once,
pre-MR2 days, but we wound up deciding it is unsuitable to support
this ourselves.

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB