|
|
-
Tasktracker appearing from "nowhere"
PeterAtReunion 2010-05-28, 01:47
I'm getting the following errors:
WARN org.apache.hadoop.mapred.JobTracker: Serious problem, cannot find record of 'previous' heartbeat for 'tracker_m351.ra.wink.com:localhost/127.0.0.1:41885'; reinitializing the tasktracker
INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201005271529_0004_r_000042_1' to tip task_201005271529_0004_r_000042, for tracker 'tracker_m351.ra.wink.com:localhost/127.0.0.1:41885'
INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201005271529_0004_m_000112_0' from 'tracker_m351.ra.wink.com:localhost/127.0.0.1:41885' despite not having m351 in any of the config files except racks.txt. If I take it out of there I can't start any jobs at all.
Question is - what would make a machine be contacted as a tasktracker when it is not in the slave or *.xml files?
Thanks -
;;peter
-
Re: Tasktracker appearing from "nowhere"
Hemanth Yamijala 2010-05-28, 09:51
Peter,
> I'm getting the following errors: > > WARN org.apache.hadoop.mapred.JobTracker: Serious problem, cannot find record of 'previous' heartbeat for 'tracker_m351.ra.wink.com:localhost/127.0.0.1:41885'; > reinitializing the tasktracker > > INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201005271529_0004_r_000042_1' to tip task_201005271529_0004_r_000042, for tracker > 'tracker_m351.ra.wink.com:localhost/127.0.0.1:41885' > > INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201005271529_0004_m_000112_0' from 'tracker_m351.ra.wink.com:localhost/127.0.0.1:41885' > > > despite not having m351 in any of the config files except racks.txt. > If I take it out of there I can't start any jobs at all. > > Question is - what would make a machine be contacted as a tasktracker when it is not in the slave or *.xml files? >
If m351 has Hadoop and a mapred-site.xml or hadoop-site.xml pointing to the right JobTracker, it would register itself as a TaskTracker when Hadoop starts on it. The slave file is used primarily to start the daemons from a central place and is not a way to specify which nodes must join the Hadoop cluster.
Thanks hemanth
-
Re: Tasktracker appearing from "nowhere"
PeterAtReunion 2010-05-28, 17:54
Hemanth -
Thanks for the insight on the use of slave file.
In my case there is no Hadoop running on the machine m351. IN fact no java based programs running on it at all. The machine was in the cluster (mistakenly in the slave file durring a start-all.sh invocation) for a short time, but since then has been completely purged from everywhere except the racks.txt file.
When it was *not* in the racks.txt file no mapreduce jobs will start. Instead get endless error loops of:
java.io.IOException: java.lang.NullPointerException 2010-05-27 00:00:00,339 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker 'tracker_m351.ra.wink.com:localhost/127.0.0.1:44063' 2010-05-27 00:00:00,413 WARN org.apache.hadoop.net.ScriptBasedMapping: Script /usr/local/bin/wk_rack.sh returned 0 values when 1 were expected. 2010-05-27 00:00:00,413 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 54311, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@6991c610, true, true, true, -1) from 10.3.0.138:34455: error: java.io.IOEx
This is our own ScriptBasedMapping for generating the hostDNS => rack NetworkTopology name mapping. This script is called with the m351 host name but I can't figure out why or where from.
Any insights on who remembers topology between shutdown/restarts? (consisting of bin/stop-all.sh and a confirmation that all java programs are stopped on all hosts on our network, followed by bin/start-all.sh on the master NameNode that seems to just walk the slaves file.)
;;peter On 05/28/10 02:51, Hemanth Yamijala wrote: > Peter, > >> I'm getting the following errors: >> >> WARN org.apache.hadoop.mapred.JobTracker: Serious problem, cannot find record of 'previous' heartbeat for 'tracker_m351.ra.wink.com:localhost/127.0.0.1:41885'; >> reinitializing the tasktracker >> >> INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201005271529_0004_r_000042_1' to tip task_201005271529_0004_r_000042, for tracker >> 'tracker_m351.ra.wink.com:localhost/127.0.0.1:41885' >> >> INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201005271529_0004_m_000112_0' from 'tracker_m351.ra.wink.com:localhost/127.0.0.1:41885' >> >> >> despite not having m351 in any of the config files except racks.txt. >> If I take it out of there I can't start any jobs at all. >> >> Question is - what would make a machine be contacted as a tasktracker when it is not in the slave or *.xml files? >> > > If m351 has Hadoop and a mapred-site.xml or hadoop-site.xml pointing > to the right JobTracker, it would register itself as a TaskTracker > when Hadoop starts on it. The slave file is used primarily to start > the daemons from a central place and is not a way to specify which > nodes must join the Hadoop cluster. > > Thanks > hemanth
-
Re: Tasktracker appearing from "nowhere" - [SOLVED]
PeterAtReunion 2010-05-29, 02:21
The probelm was some errant tasktrackers still running on hosts I thought were down. When I stopped *all* tasktrackers a fresh restart seemed to run cleanly.
Thanks again to Hemanth for giveing me the clue that the slave file was advisory only.
;;peter On 05/28/10 10:54, PeterAtReunion wrote: > Hemanth - > > Thanks for the insight on the use of slave file. > > In my case there is no Hadoop running on the machine m351. IN fact no java based programs running on it at all. > The machine was in the cluster (mistakenly in the slave file durring a start-all.sh invocation) for a short time, > but since then has been completely purged from everywhere except the racks.txt file. > > When it was *not* in the racks.txt file no mapreduce jobs will start. Instead get endless error loops of: > > java.io.IOException: java.lang.NullPointerException > 2010-05-27 00:00:00,339 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker 'tracker_m351.ra.wink.com:localhost/127.0.0.1:44063' > 2010-05-27 00:00:00,413 WARN org.apache.hadoop.net.ScriptBasedMapping: Script /usr/local/bin/wk_rack.sh returned 0 values when 1 were expected. > 2010-05-27 00:00:00,413 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 54311, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@6991c610, true, true, > true, -1) from 10.3.0.138:34455: error: java.io.IOEx > > This is our own ScriptBasedMapping for generating the hostDNS => rack NetworkTopology name mapping. > This script is called with the m351 host name but I can't figure out why or where from. > > Any insights on who remembers topology between shutdown/restarts? > (consisting of bin/stop-all.sh and a confirmation that all java programs are stopped > on all hosts on our network, followed by bin/start-all.sh on the master NameNode that seems to just walk the slaves file.) > > ;;peter > > > On 05/28/10 02:51, Hemanth Yamijala wrote: >> Peter, >> >>> I'm getting the following errors: >>> >>> WARN org.apache.hadoop.mapred.JobTracker: Serious problem, cannot find record of 'previous' heartbeat for 'tracker_m351.ra.wink.com:localhost/127.0.0.1:41885'; >>> reinitializing the tasktracker >>> >>> INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201005271529_0004_r_000042_1' to tip task_201005271529_0004_r_000042, for tracker >>> 'tracker_m351.ra.wink.com:localhost/127.0.0.1:41885' >>> >>> INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201005271529_0004_m_000112_0' from 'tracker_m351.ra.wink.com:localhost/127.0.0.1:41885' >>> >>> >>> despite not having m351 in any of the config files except racks.txt. >>> If I take it out of there I can't start any jobs at all. >>> >>> Question is - what would make a machine be contacted as a tasktracker when it is not in the slave or *.xml files? >>> >> >> If m351 has Hadoop and a mapred-site.xml or hadoop-site.xml pointing >> to the right JobTracker, it would register itself as a TaskTracker >> when Hadoop starts on it. The slave file is used primarily to start >> the daemons from a central place and is not a way to specify which >> nodes must join the Hadoop cluster. >> >> Thanks >> hemanth
-
Re: Tasktracker appearing from "nowhere"
Sudhir Vallamkondu 2010-06-01, 17:09
This is exactly why one would need to maintain a list of authorized nodes. Here¹s the excerpt from O¹Reily ³Hadoop Definitive Guide² book. The below cites Datanodes but it applies to TaskTrackers as well.
³It is a potential security risk to allow any machine to connect to the namenode and act as a datanode, since the machine may gain access to data that it is not authorized to see. Furthermore, since such a machine is not a real datanode, it is not under your control, and may stop at any time, causing potential data loss. This scenario is a risk even inside a firewall, through misconfiguration, so datanodes (and tasktrackers) should be explicitly managed on all production clusters. Datanodes that are permitted to connect to the namenode are specified in a file whose name is specified by the dfs.hosts property. The file resides on the namenode¹s local filesystem, and it contains a line for each datanode, specified by network address (as reported by the datanode---you can see what this is by looking at the namenode¹s web UI). If you need to specify multiple network addresses for a datanode, put them on one line, separated by whitespace. Similarly, tasktrackers that may connect to the jobtracker are specified in a file whose name is specified by the mapred.hosts property. In most cases, there is one shared file, referred to as the include file, that both dfs.hosts and mapred.hosts refer to, since nodes in the cluster run both datanode and tasktracker daemons. The file (or files) specified by the dfs.hosts and mapred.hosts properties is different from the slaves file. The former is used by the namenode and jobtracker to determine which worker nodes may connect. The slaves file is used by the Hadoop control scripts to perform cluster-wide operations, such as cluster restarts. It is never used by the Hadoop daemons.²
iCrossing Privileged and Confidential Information This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information of iCrossing. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
|
|