Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> conf.setMaxMapAttempts, SkipBadRecords, etc.


Copy link to this message
-
conf.setMaxMapAttempts, SkipBadRecords, etc.
Let's say I want to ditch an input record the very first time it fails (because I know it is a deterministic data-dependent failure) instead of retrying it the default four times.  I have already experimented with conf.setMaxMapAttempts() with no success.  For example, consider the following:

int maxMapAttempts = conf.getMaxMapAttempts();
conf.setMaxMapAttempts(1);
int maxMapAttempts = conf.getMaxMapAttempts();

Before calling conf.setMaxMapAttempts(1), getMaxMapAttempts() returns the default, 4, and after calling conf.setMaxMapAttempts(1), it returns 1.  However, despite that encouraging feedback, it doesn't work.  The Hadoop job still restarts each failed map task four times.  Furthermore, I have confirmed that the job.xml file on the job tracker has the following:

mapred.map.max.attempts = 4

...which proves it really didn't change mapred.map.max.attempts!  I also added the following to my mapred-site.xml file:

<property>
    <name>mapred.map.max.attempts</name>
    <value>1</value>
    <final>true</final>
    <description>Max map attempts.
    </description>
</property>

When I do that, the initial call conf.getMaxMapAttempts() return 1, not 4, just as expected...but nonetheless, the job.xml file on the job tracker reports that the value has reverted to 4 once again.  I have sought a solution to this problem for a long time and have decided that no one knows how to fix it (if you have any ideas PLEASE let me know), so I'm moving on to a different approach.  I am now trying the following:

SkipBadRecords.setMapperMaxSkipRecords(conf, 1);
SkipBadRecords.setAttemptsToStartSkipping(conf, 1);

First, can anyone confirm that this is the correct set of calls to make SkipBadRecords skip a record after its first failure?

Second, this doesn't work either!  My map tasks still restart four times.

I'm really desperate on this and so far my research has turned up nothing.  I would greatly appreciate any help on this matter.

Thank you.

________________________________________________________________________________
Keith Wiley               [EMAIL PROTECTED]               www.keithwiley.com

"Yet mark his perfect self-contentment, and hence learn his lesson, that to be
self-contented is to be vile and ignorant, and that to aspire is better than to
be blindly and impotently happy."
  -- Edwin A. Abbott, Flatland
________________________________________________________________________________
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB