Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Please define blacklisting, graylisting, and excluded nodes in Hadoop 1.0.3


Copy link to this message
-
Please define blacklisting, graylisting, and excluded nodes in Hadoop 1.0.3
Dan F 2013-02-22, 06:28
At the top of the job tracker in Hadoop, it reports blacklisted,
greylisted, and excluded nodes. (We are using Amazon EMR AMI 2.3.1, which
is Hadoop 1.0.3 I believe.)

Hadoop docs<http://hadoop.apache.org/docs/r1.0.4/cluster_setup.html#Monitoring+Health+of+TaskTracker+Nodes>
say
nodes can be blacklisted by a monitoring script. It does not say if there
is a default monitoring script, or what it might do.
mapr<http://www.mapr.com/doc/display/MapR/mapred-site.xml#mapred-site.xml-mapred.max.tracker.blacklists>
says
a task tracker is blacklisted if a node is blacklisted by
mapred.max.tracker.blacklists jobs. (It says a task tracker is blacklisted
from a job if it is blacklisted mapred.max.tracker.failures times in a job.)

So which is it: monitoring script; this blacklist-per-job, then across
jobs; both; some other mechanism? Is there a definitive source of this
information?

If I look in Jira (MAPREDUCE-1966) and the source code (JobTracker.java),
it looks as if nodes blacklisted as mapr described (4 times in a job, then
across 4 jobs) were changed to graylisting because there was debate over
the heuristics. However, it's unclear to me if that affects 1.0.3. "Fixed
version" in Jira shows "unresolved."

And what about excluded?

Please rigorously define blacklisting, greylisting, excluded nodes for
1.0.3, preferably with a ref. Thanks!