|
|
-
Please define blacklisting, graylisting, and excluded nodes in Hadoop 1.0.3Dan F 2013-02-22, 06:28
At the top of the job tracker in Hadoop, it reports blacklisted,
greylisted, and excluded nodes. (We are using Amazon EMR AMI 2.3.1, which is Hadoop 1.0.3 I believe.) Hadoop docs<http://hadoop.apache.org/docs/r1.0.4/cluster_setup.html#Monitoring+Health+of+TaskTracker+Nodes> say nodes can be blacklisted by a monitoring script. It does not say if there is a default monitoring script, or what it might do. mapr<http://www.mapr.com/doc/display/MapR/mapred-site.xml#mapred-site.xml-mapred.max.tracker.blacklists> says a task tracker is blacklisted if a node is blacklisted by mapred.max.tracker.blacklists jobs. (It says a task tracker is blacklisted from a job if it is blacklisted mapred.max.tracker.failures times in a job.) So which is it: monitoring script; this blacklist-per-job, then across jobs; both; some other mechanism? Is there a definitive source of this information? If I look in Jira (MAPREDUCE-1966) and the source code (JobTracker.java), it looks as if nodes blacklisted as mapr described (4 times in a job, then across 4 jobs) were changed to graylisting because there was debate over the heuristics. However, it's unclear to me if that affects 1.0.3. "Fixed version" in Jira shows "unresolved." And what about excluded? Please rigorously define blacklisting, greylisting, excluded nodes for 1.0.3, preferably with a ref. Thanks! |