|
|
-
rack topology data update
Jameson Li 2012-09-13, 03:03
Our hadoop version is hadoop-0.20-append+4.
We have configured the rack awareness in the namenode. But when I add new datanode, and update the topology data file, and restart the datanode, I just see the log in the namenode that: 2012-09-13 10:35:25,074 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/ipc:50010 So should I restart the namenode? Is there some command like 'hadoop dfsadmin -refreshtopology'?
My configuration:
*core-site.xml:* <property> <name>topology.script.file.name</name> <value>conf/rack-awareness.sh</value> </property> <property> <name>topology.script.number.args</name> <value>1000</value> </property>
*conf/rack-awareness.sh:* #!/bin/sh
HADOOP_CONF=/opt/hadoop/conf
while [ $# -gt 0 ] ; do nodeArg=$1 exec< ${HADOOP_CONF}/topology.data result="" while read line ; do ar=( $line ) if [ "${ar[0]}" = "$nodeArg" ] ; then result="${ar[1]}" fi done shift if [ -z "$result" ] ; then echo -n "/default-rack " else echo -n "$result " fi done
topology.data: ipa rackA ipb rackA ipc rackB And also I have search the mailing list "topology.script.file update": I found a mail that: Tom Hall 2011-10-27, 16:07 I was hoping that if I updated the file it would give new answers as datanodes were restarted and reconnected but that does not seem to be the case. Surely I dont need to restart the namenode...
But there is not replying. So somebody can help me? 专注于Mysql,MSSQL,Oracle,Hadoop
-
Re: rack topology data update
Harsh J 2012-09-13, 03:32
Jameson,
The right process to add a new node with the right mapping is:
1. Update topology file for the new DN. 2. Issue a dfsadmin -refreshNodes to get new topology mapping updated in NN. 3. Start the DN only after (2) so it picks up the right mapping and a default mapping does not get cached.
On Thu, Sep 13, 2012 at 8:33 AM, Jameson Li <[EMAIL PROTECTED]> wrote: > Our hadoop version is hadoop-0.20-append+4. > > We have configured the rack awareness in the namenode. > But when I add new datanode, and update the topology data file, and restart > the datanode, I just see the log in the namenode that: > 2012-09-13 10:35:25,074 INFO org.apache.hadoop.net.NetworkTopology: Adding a > new node: /default-rack/ipc:50010 > So should I restart the namenode? > Is there some command like 'hadoop dfsadmin -refreshtopology'? > > My configuration: > > core-site.xml: > <property> > <name>topology.script.file.name</name> > <value>conf/rack-awareness.sh</value> > </property> > <property> > <name>topology.script.number.args</name> > <value>1000</value> > </property> > > conf/rack-awareness.sh: > #!/bin/sh > > HADOOP_CONF=/opt/hadoop/conf > > while [ $# -gt 0 ] ; do > nodeArg=$1 > exec< ${HADOOP_CONF}/topology.data > result="" > while read line ; do > ar=( $line ) > if [ "${ar[0]}" = "$nodeArg" ] ; then > result="${ar[1]}" > fi > done > shift > if [ -z "$result" ] ; then > echo -n "/default-rack " > else > echo -n "$result " > fi > done > > topology.data: > ipa rackA > ipb rackA > ipc rackB > > > And also I have search the mailing list "topology.script.file update": > I found a mail that: > Tom Hall 2011-10-27, 16:07 > I was hoping that if I updated the file it would give new answers as > datanodes were restarted and reconnected but that does not seem to be > the case. > Surely I dont need to restart the namenode... > > But there is not replying. > So somebody can help me? > > > 专注于Mysql,MSSQL,Oracle,Hadoop
-- Harsh J
-
Re: rack topology data update
Jameson Li 2012-09-13, 05:51
Hi harsh,
I have followed your suggestion operation.
1, stop the new datanode.(I have modified the topology file in the namenode before.) 2, run 'hadoop dfsadmin -refreshNodes' on the namenode 3, start the new datanode.
But it really not update the new topology mapping. It just show the start info in the namenode that: " 2012-09-13 13:44:14,706 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/10.0.10.100:50010 2012-09-13 13:44:14,706 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/10.0.10.100:50010 " 专注于Mysql,MSSQL,Oracle,Hadoop 2012/9/13 Harsh J <[EMAIL PROTECTED]>
> the DN only after (2) so it picks up the right mapping an
-
Re: rack topology data update
Viji R 2012-09-13, 05:55
Hi Jameson,
If the NameNode has cached the wrong value earlier, it will not refresh that until you restart it.
On Thu, Sep 13, 2012 at 11:21 AM, Jameson Li <[EMAIL PROTECTED]> wrote: > Hi harsh, > > I have followed your suggestion operation. > > 1, stop the new datanode.(I have modified the topology file in the namenode > before.) > 2, run 'hadoop dfsadmin -refreshNodes' on the namenode > 3, start the new datanode. > > But it really not update the new topology mapping. > It just show the start info in the namenode that: > " > 2012-09-13 13:44:14,706 INFO org.apache.hadoop.net.NetworkTopology: Removing > a node: /default-rack/10.0.10.100:50010 > 2012-09-13 13:44:14,706 INFO org.apache.hadoop.net.NetworkTopology: Adding a > new node: /default-rack/10.0.10.100:50010 > " > > > 专注于Mysql,MSSQL,Oracle,Hadoop > > > > 2012/9/13 Harsh J <[EMAIL PROTECTED]> >> >> the DN only after (2) so it picks up the right mapping an > >
-
Re: rack topology data update
Saurabh bhutyani 2012-09-13, 06:22
I believe running the following command on namenode should refresh it.
'hadoop dfsadmin -refreshNodes'
Thanks & Regards, Saurabh Bhutyani
Call : 9820083104 Gtalk: [EMAIL PROTECTED]
On Thu, Sep 13, 2012 at 11:25 AM, Viji R <[EMAIL PROTECTED]> wrote:
> Hi Jameson, > > If the NameNode has cached the wrong value earlier, it will not > refresh that until you restart it. > > On Thu, Sep 13, 2012 at 11:21 AM, Jameson Li <[EMAIL PROTECTED]> wrote: > > Hi harsh, > > > > I have followed your suggestion operation. > > > > 1, stop the new datanode.(I have modified the topology file in the > namenode > > before.) > > 2, run 'hadoop dfsadmin -refreshNodes' on the namenode > > 3, start the new datanode. > > > > But it really not update the new topology mapping. > > It just show the start info in the namenode that: > > " > > 2012-09-13 13:44:14,706 INFO org.apache.hadoop.net.NetworkTopology: > Removing > > a node: /default-rack/10.0.10.100:50010 > > 2012-09-13 13:44:14,706 INFO org.apache.hadoop.net.NetworkTopology: > Adding a > > new node: /default-rack/10.0.10.100:50010 > > " > > > > > > 专注于Mysql,MSSQL,Oracle,Hadoop > > > > > > > > 2012/9/13 Harsh J <[EMAIL PROTECTED]> > >> > >> the DN only after (2) so it picks up the right mapping an > > > > >
-
Re: rack topology data update
Jameson Li 2012-09-13, 07:19
Hi Harsh J, Viji R, Saurabh bhutyani,
Thanks for all of yours replying.
But really the namenode not refresh the rack info.
Is my hadoop version issue? My hadoop version is base on hadoop-0.20-append, and have 4 patches on it that I think they are really no matter with the rack awareness.
If anytime I add new nodes, and I should restart namenode to refresh the rack info, I will crazy...
专注于Mysql,MSSQL,Oracle,Hadoop 2012/9/13 Saurabh bhutyani <[EMAIL PROTECTED]>
> I believe running the following command on namenode should refresh it. > > 'hadoop dfsadmin -refreshNodes' > > Thanks & Regards, > Saurabh Bhutyani > > Call : 9820083104 > Gtalk: [EMAIL PROTECTED] > > > > > On Thu, Sep 13, 2012 at 11:25 AM, Viji R <[EMAIL PROTECTED]> wrote: > >> Hi Jameson, >> >> If the NameNode has cached the wrong value earlier, it will not >> refresh that until you restart it. >> >> On Thu, Sep 13, 2012 at 11:21 AM, Jameson Li <[EMAIL PROTECTED]> wrote: >> > Hi harsh, >> > >> > I have followed your suggestion operation. >> > >> > 1, stop the new datanode.(I have modified the topology file in the >> namenode >> > before.) >> > 2, run 'hadoop dfsadmin -refreshNodes' on the namenode >> > 3, start the new datanode. >> > >> > But it really not update the new topology mapping. >> > It just show the start info in the namenode that: >> > " >> > 2012-09-13 13:44:14,706 INFO org.apache.hadoop.net.NetworkTopology: >> Removing >> > a node: /default-rack/10.0.10.100:50010 >> > 2012-09-13 13:44:14,706 INFO org.apache.hadoop.net.NetworkTopology: >> Adding a >> > new node: /default-rack/10.0.10.100:50010 >> > " >> > >> > >> > 专注于Mysql,MSSQL,Oracle,Hadoop >> > >> > >> > >> > 2012/9/13 Harsh J <[EMAIL PROTECTED]> >> >> >> >> the DN only after (2) so it picks up the right mapping an >> > >> > >> > >
-
Re: rack topology data update
Harsh J 2012-09-13, 08:03
Hi Jameson,
As I'd mentioned, due to the current behavior, if the NN has cached a bad topology mapping value already, it will not forget it despite a -refreshNodes command. Otherwise, there's no problem. That is if you've done the following, NN may require a restart:
1. Start new DN (At this point, DN gets mapped to default rack as there's no entry, and this is cached) 2. Update topology file, do refreshNodes
This should be fixed in one of the 2.x releases, where we also refresh the cached values.
On Thu, Sep 13, 2012 at 12:49 PM, Jameson Li <[EMAIL PROTECTED]> wrote: > Hi Harsh J, Viji R, Saurabh bhutyani, > > Thanks for all of yours replying. > > But really the namenode not refresh the rack info. > > Is my hadoop version issue? My hadoop version is base on hadoop-0.20-append, > and have 4 patches on it that I think they are really no matter with the > rack awareness. > > If anytime I add new nodes, and I should restart namenode to refresh the > rack info, I will crazy... > > 专注于Mysql,MSSQL,Oracle,Hadoop > > > > 2012/9/13 Saurabh bhutyani <[EMAIL PROTECTED]> >> >> I believe running the following command on namenode should refresh it. >> >> 'hadoop dfsadmin -refreshNodes' >> >> Thanks & Regards, >> Saurabh Bhutyani >> >> Call : 9820083104 >> Gtalk: [EMAIL PROTECTED] >> >> >> >> >> On Thu, Sep 13, 2012 at 11:25 AM, Viji R <[EMAIL PROTECTED]> wrote: >>> >>> Hi Jameson, >>> >>> If the NameNode has cached the wrong value earlier, it will not >>> refresh that until you restart it. >>> >>> On Thu, Sep 13, 2012 at 11:21 AM, Jameson Li <[EMAIL PROTECTED]> wrote: >>> > Hi harsh, >>> > >>> > I have followed your suggestion operation. >>> > >>> > 1, stop the new datanode.(I have modified the topology file in the >>> > namenode >>> > before.) >>> > 2, run 'hadoop dfsadmin -refreshNodes' on the namenode >>> > 3, start the new datanode. >>> > >>> > But it really not update the new topology mapping. >>> > It just show the start info in the namenode that: >>> > " >>> > 2012-09-13 13:44:14,706 INFO org.apache.hadoop.net.NetworkTopology: >>> > Removing >>> > a node: /default-rack/10.0.10.100:50010 >>> > 2012-09-13 13:44:14,706 INFO org.apache.hadoop.net.NetworkTopology: >>> > Adding a >>> > new node: /default-rack/10.0.10.100:50010 >>> > " >>> > >>> > >>> > 专注于Mysql,MSSQL,Oracle,Hadoop >>> > >>> > >>> > >>> > 2012/9/13 Harsh J <[EMAIL PROTECTED]> >>> >> >>> >> the DN only after (2) so it picks up the right mapping an >>> > >>> > >> >> >
-- Harsh J
-
Re: rack topology data update
Steve Loughran 2012-09-13, 09:10
On 13 September 2012 09:03, Harsh J <[EMAIL PROTECTED]> wrote:
> > This should be fixed in one of the 2.x releases, where we also refresh > the cached values. > > Really? Which JIRA?
I've been making changes to the topology logic so you can do some preflight checking and dump the topologies, but didn't think a clear and reload was in there. Some decisions on block placement strategy (flat vs hierarchical) are made early on, so going from flat to multi-switch is not something I'd recommend.
-
Re: rack topology data update
Harsh J 2012-09-13, 09:15
Hey Steve,
True about the decisions part, that still needs the ugly fixing of re-replication.
I think I saw it on a JIRA by Patrick A. that was trying to change the way we did this. But placement is also pluggable in 2.x right? Let me find that JIRA (but yeah, am unsure if it was committed).
On Thu, Sep 13, 2012 at 2:40 PM, Steve Loughran <[EMAIL PROTECTED]> wrote: > > > On 13 September 2012 09:03, Harsh J <[EMAIL PROTECTED]> wrote: >> >> >> This should be fixed in one of the 2.x releases, where we also refresh >> the cached values. >> > > Really? Which JIRA? > > I've been making changes to the topology logic so you can do some preflight > checking and dump the topologies, but didn't think a clear and reload was in > there. Some decisions on block placement strategy (flat vs hierarchical) are > made early on, so going from flat to multi-switch is not something I'd > recommend.
-- Harsh J
-
Re: rack topology data update
Harsh J 2012-09-14, 04:28
Jameson,
Yes unfortunately this is the current case.
On Fri, Sep 14, 2012 at 8:58 AM, Jameson Li <[EMAIL PROTECTED]> wrote: > Harsh J, > > If a new datanode has joined the cluster, and has a default rack info cached > in the namenode, it will no way to change its cache other than restart the > namenode. > Am I right? > > 专注于Mysql,MSSQL,Oracle,Hadoop > > > > 2012/9/13 Harsh J <[EMAIL PROTECTED]> >> >> Hey Steve, >> >> True about the decisions part, that still needs the ugly fixing of >> re-replication. >> >> I think I saw it on a JIRA by Patrick A. that was trying to change the >> way we did this. But placement is also pluggable in 2.x right? Let me >> find that JIRA (but yeah, am unsure if it was committed). >> >> On Thu, Sep 13, 2012 at 2:40 PM, Steve Loughran <[EMAIL PROTECTED]> >> wrote: >> > >> > >> > On 13 September 2012 09:03, Harsh J <[EMAIL PROTECTED]> wrote: >> >> >> >> >> >> This should be fixed in one of the 2.x releases, where we also refresh >> >> the cached values. >> >> >> > >> > Really? Which JIRA? >> > >> > I've been making changes to the topology logic so you can do some >> > preflight >> > checking and dump the topologies, but didn't think a clear and reload >> > was in >> > there. Some decisions on block placement strategy (flat vs hierarchical) >> > are >> > made early on, so going from flat to multi-switch is not something I'd >> > recommend. >> >> >> >> -- >> Harsh J > >
-- Harsh J
|
|