Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> SkipBadRecords confusion


Copy link to this message
-
SkipBadRecords confusion
Some of my inputs fail deterministically and I would like to avoid trying them four times.  There seem to be two approachs, setMaxMapAttempts() and SkipBadRecords.  I'm trying to figure both of them out.  Currently, mapred.map.max.attempts is configured as final so I can't change it...so I'm trying to get SkipBadRecords to work.  I currently have this:

SkipBadRecords.setMapperMaxSkipRecords(conf, 1);
SkipBadRecords.setAttemptsToStartSkipping(conf, 1);

Note that I have set up my Hadoop job such that each input gets its own mapper, or put differently, each map task only has one input record to process, only one call to the map() method.  Therefore, I would expect the SkipBadRecords configuration above to force Hadoop to only attempt each input once since there is no "range" to narrow in on (what with there being  single input record)...but it seems to have no effect whatsoever.  Each mapper is still tried the original default four times.  It doesn't seem to detect and exclude the one bad record and bail on the rest of the task attempt (since there are no other inputs to process).

Any ideas why this is happening?  How can I get it to only try the input once and then give up.  These repeated attempts are holding up the reducer and therefore the entire job.

Thanks.

________________________________________________________________________________
Keith Wiley               [EMAIL PROTECTED]               www.keithwiley.com

"Luminous beings are we, not this crude matter."
  -- Yoda
________________________________________________________________________________
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB