Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # dev >> Fate of skipping-bad-records feature (MAPREDUCE-4840)


Copy link to this message
-
Fate of skipping-bad-records feature (MAPREDUCE-4840)
(Taking the discussion out of the JIRA so I can tap into the historical and
other knowledge of the group)

Hi all,
I was looking through MR code in trunk (always a fun weekend activity) and
was puzzled by the loose ends around the feature to skip bad records; by
loose ends I mean there's a lot of code in there but it can't really work.
I dug through JIRA's and found MAPREDUCE-1932 that I thought implied that
the feature is now intentionally dead, so I filed MAPREDUCE-4840 to delete
dead code and deprecate API's. In that new JIRA however Hash corrected me,
pointing out the MAPREDUCE-1932 only applied to the new API. So my
questions are:

1. Do we want to support the skip-bad-records feature for the old API in
trunk? Personally I think it's a bit weird to tie this feature to which API
you use since the feature is configured by config file, not as part of the
API, but I don't have a strong opinion either way.
2. Is there a JIRA/other work tracking enabling this feature in Yarn/trunk?
There are "Not yet implemented" exceptions being thrown in the code that
makes me think someone is aware of that and there's a plan to fix, so I'm
wondering where that is tracked.
Thanks,
Mostafa
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB