|
|
-
Is it necessary to run secondary namenode when starting HDFS?
Ivan Ryndin 2012-12-17, 17:04
Hi all,
is it necessary to run secondary namenode when starting HDFS? I am dealing with Hadoop 1.1.1. Looking at script $HADOOP_HOME/bin/start_dfs.sh There are next lines in this file:
# start dfs daemons # start namenode after datanodes, to minimize time namenode is up w/o data # note: datanodes will log connection errors until namenode starts "$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start namenode $nameStartOpt "$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start datanode $dataStartOpt "$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR --hosts masters start secondarynamenode
So, will HDFS work if I turn off starting of secondarynamenode ?
I do ask this because I am playing with Hadoop on two-node cluster only (and machines in cluster do not have much RAM and disk space), and thus don't want to run unnecessary processes.
-- Best regards, Ivan P. Ryndin,
-
Re: Is it necessary to run secondary namenode when starting HDFS?
Harsh J 2012-12-17, 17:09
The SecondaryNameNode is necessary for automatic maintenance in long-running clusters (read: production), but is not necessary for, nor tied into the basic functions/operations of HDFS.
On 1.x, you can remove the script's startup of SNN by removing its host entry from the conf/masters file. On 2.x, you can selectively start the NN and DNs by using the hadoop-daemon.sh script commands.
On Mon, Dec 17, 2012 at 10:34 PM, Ivan Ryndin <[EMAIL PROTECTED]> wrote: > Hi all, > > is it necessary to run secondary namenode when starting HDFS? > I am dealing with Hadoop 1.1.1. > Looking at script $HADOOP_HOME/bin/start_dfs.sh > There are next lines in this file: > > # start dfs daemons > # start namenode after datanodes, to minimize time namenode is up w/o data > # note: datanodes will log connection errors until namenode starts > "$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start namenode > $nameStartOpt > "$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start datanode > $dataStartOpt > "$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR --hosts masters start > secondarynamenode > > So, will HDFS work if I turn off starting of secondarynamenode ? > > I do ask this because I am playing with Hadoop on two-node cluster only (and > machines in cluster do not have much RAM and disk space), and thus don't > want to run unnecessary processes. > > -- > Best regards, > Ivan P. Ryndin, >
-- Harsh J
-
Re: Is it necessary to run secondary namenode when starting HDFS?
Bryan Beaudreault 2012-12-17, 17:12
You don't need a secondary name node. It creates snapshots of the name node metadata periodically, which helps to keep down the size of the edits files. If you don't run one, over time your edits files will grow. The next time you go to restart your namenode, it could take a very long time to start up if your edits are large. I recommend running one in production, to reduce the amount of downtime if you need to replace or restart your namenode. If that isn't a concern for you then you don't need it. On Mon, Dec 17, 2012 at 12:04 PM, Ivan Ryndin <[EMAIL PROTECTED]> wrote:
> Hi all, > > is it necessary to run secondary namenode when starting HDFS? > I am dealing with Hadoop 1.1.1. > Looking at script $HADOOP_HOME/bin/start_dfs.sh > There are next lines in this file: > > # start dfs daemons > # start namenode after datanodes, to minimize time namenode is up w/o data > # note: datanodes will log connection errors until namenode starts > "$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start namenode > $nameStartOpt > "$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start datanode > $dataStartOpt > "$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR --hosts masters start > secondarynamenode > > So, will HDFS work if I turn off starting of secondarynamenode ? > > I do ask this because I am playing with Hadoop on two-node cluster only > (and machines in cluster do not have much RAM and disk space), and thus > don't want to run unnecessary processes. > > -- > Best regards, > Ivan P. Ryndin, > >
-
Re: Is it necessary to run secondary namenode when starting HDFS?
Ivan Ryndin 2012-12-17, 17:22
Thank you very much!
It is now clear for me, that in development mode I'll not start secondary namenode.But in production it's better to have it. Thanks!
Regards, Ivan 2012/12/17 Harsh J <[EMAIL PROTECTED]>
> The SecondaryNameNode is necessary for automatic maintenance in > long-running clusters (read: production), but is not necessary for, > nor tied into the basic functions/operations of HDFS. > > On 1.x, you can remove the script's startup of SNN by removing its > host entry from the conf/masters file. > On 2.x, you can selectively start the NN and DNs by using the > hadoop-daemon.sh script commands. > > On Mon, Dec 17, 2012 at 10:34 PM, Ivan Ryndin <[EMAIL PROTECTED]> wrote: > > Hi all, > > > > is it necessary to run secondary namenode when starting HDFS? > > I am dealing with Hadoop 1.1.1. > > Looking at script $HADOOP_HOME/bin/start_dfs.sh > > There are next lines in this file: > > > > # start dfs daemons > > # start namenode after datanodes, to minimize time namenode is up w/o > data > > # note: datanodes will log connection errors until namenode starts > > "$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start namenode > > $nameStartOpt > > "$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start datanode > > $dataStartOpt > > "$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR --hosts masters start > > secondarynamenode > > > > So, will HDFS work if I turn off starting of secondarynamenode ? > > > > I do ask this because I am playing with Hadoop on two-node cluster only > (and > > machines in cluster do not have much RAM and disk space), and thus don't > > want to run unnecessary processes. > > > > -- > > Best regards, > > Ivan P. Ryndin, > > > > > > -- > Harsh J >
--
Best regards, Ivan P. Ryndin,
-
Re: Is it necessary to run secondary namenode when starting HDFS?
Ivan Ryndin 2012-12-17, 17:23
Thank you very much, Bryan!
It is now clear for me, that in development mode I'll not start secondary namenode. But in production it's better to have it. Thanks!
Regards, Ivan 2012/12/17 Bryan Beaudreault <[EMAIL PROTECTED]>
> You don't need a secondary name node. It creates snapshots of the name > node metadata periodically, which helps to keep down the size of the edits > files. If you don't run one, over time your edits files will grow. The > next time you go to restart your namenode, it could take a very long time > to start up if your edits are large. I recommend running one in > production, to reduce the amount of downtime if you need to replace or > restart your namenode. If that isn't a concern for you then you don't need > it. > > > On Mon, Dec 17, 2012 at 12:04 PM, Ivan Ryndin <[EMAIL PROTECTED]> wrote: > >> Hi all, >> >> is it necessary to run secondary namenode when starting HDFS? >> I am dealing with Hadoop 1.1.1. >> Looking at script $HADOOP_HOME/bin/start_dfs.sh >> There are next lines in this file: >> >> # start dfs daemons >> # start namenode after datanodes, to minimize time namenode is up w/o data >> # note: datanodes will log connection errors until namenode starts >> "$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start namenode >> $nameStartOpt >> "$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start datanode >> $dataStartOpt >> "$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR --hosts masters start >> secondarynamenode >> >> So, will HDFS work if I turn off starting of secondarynamenode ? >> >> I do ask this because I am playing with Hadoop on two-node cluster only >> (and machines in cluster do not have much RAM and disk space), and thus >> don't want to run unnecessary processes. >> >> -- >> Best regards, >> Ivan P. Ryndin, >> >> > -- Best regards, Ivan P. Ryndin
-
Re: Is it necessary to run secondary namenode when starting HDFS?
Michael Segel 2012-12-17, 17:42
Hi,
Just a reminder... just because you can do something or rather in this case, not do something, doesn't mean that its a good idea.
The SN is there for a reason. Maybe if you're on an EMR cluster that will be taken down at the end of the job or end of the day not having the SN running is OK. Outside of that... its pretty much a good idea.
-Just saying...
On Dec 17, 2012, at 11:23 AM, Ivan Ryndin <[EMAIL PROTECTED]> wrote:
> Thank you very much, Bryan! > > It is now clear for me, that in development mode I'll not start secondary namenode. > But in production it's better to have it. > Thanks! > > Regards, > Ivan > > > 2012/12/17 Bryan Beaudreault <[EMAIL PROTECTED]> > You don't need a secondary name node. It creates snapshots of the name node metadata periodically, which helps to keep down the size of the edits files. If you don't run one, over time your edits files will grow. The next time you go to restart your namenode, it could take a very long time to start up if your edits are large. I recommend running one in production, to reduce the amount of downtime if you need to replace or restart your namenode. If that isn't a concern for you then you don't need it. > > > On Mon, Dec 17, 2012 at 12:04 PM, Ivan Ryndin <[EMAIL PROTECTED]> wrote: > Hi all, > > is it necessary to run secondary namenode when starting HDFS? > I am dealing with Hadoop 1.1.1. > Looking at script $HADOOP_HOME/bin/start_dfs.sh > There are next lines in this file: > > # start dfs daemons > # start namenode after datanodes, to minimize time namenode is up w/o data > # note: datanodes will log connection errors until namenode starts > "$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start namenode $nameStartOpt > "$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start datanode $dataStartOpt > "$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR --hosts masters start secondarynamenode > > So, will HDFS work if I turn off starting of secondarynamenode ? > > I do ask this because I am playing with Hadoop on two-node cluster only (and machines in cluster do not have much RAM and disk space), and thus don't want to run unnecessary processes. > > -- > Best regards, > Ivan P. Ryndin, > > > > > > -- > Best regards, > Ivan P. Ryndin
-
Re: Is it necessary to run secondary namenode when starting HDFS?
Patai Sangbutsarakum 2012-12-17, 18:52
> is it necessary to run secondary namenode when starting HDFS? I would say it's not necessary. I did skip it when I first played with Hadoop.
From: Ivan Ryndin <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Reply-To: <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Date: Mon, 17 Dec 2012 21:04:49 +0400 To: <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Subject: Is it necessary to run secondary namenode when starting HDFS?
Hi all,
is it necessary to run secondary namenode when starting HDFS? I am dealing with Hadoop 1.1.1. Looking at script $HADOOP_HOME/bin/start_dfs.sh There are next lines in this file:
# start dfs daemons # start namenode after datanodes, to minimize time namenode is up w/o data # note: datanodes will log connection errors until namenode starts "$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start namenode $nameStartOpt "$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start datanode $dataStartOpt "$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR --hosts masters start secondarynamenode
So, will HDFS work if I turn off starting of secondarynamenode ?
I do ask this because I am playing with Hadoop on two-node cluster only (and machines in cluster do not have much RAM and disk space), and thus don't want to run unnecessary processes.
-- Best regards, Ivan P. Ryndin,
-
Re: Is it necessary to run secondary namenode when starting HDFS?
Mohammad Tariq 2012-12-17, 18:59
I agree with Michael. Skipping the SNN daemon is really a bad idea when you are dealing something real.
Best Regards, Tariq +91-9741563634
On Tue, Dec 18, 2012 at 12:22 AM, Patai Sangbutsarakum < [EMAIL PROTECTED]> wrote:
> > is it necessary to run secondary namenode when starting HDFS? > I would say it's not necessary. I did skip it when I first played with > Hadoop. > > From: Ivan Ryndin <[EMAIL PROTECTED]> > Reply-To: <[EMAIL PROTECTED]> > Date: Mon, 17 Dec 2012 21:04:49 +0400 > To: <[EMAIL PROTECTED]> > Subject: Is it necessary to run secondary namenode when starting HDFS? > > Hi all, > > is it necessary to run secondary namenode when starting HDFS? > I am dealing with Hadoop 1.1.1. > Looking at script $HADOOP_HOME/bin/start_dfs.sh > There are next lines in this file: > > # start dfs daemons > # start namenode after datanodes, to minimize time namenode is up w/o data > # note: datanodes will log connection errors until namenode starts > "$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start namenode > $nameStartOpt > "$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start datanode > $dataStartOpt > "$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR --hosts masters start > secondarynamenode > > So, will HDFS work if I turn off starting of secondarynamenode ? > > I do ask this because I am playing with Hadoop on two-node cluster only > (and machines in cluster do not have much RAM and disk space), and thus > don't want to run unnecessary processes. > > -- > Best regards, > Ivan P. Ryndin, > >
|
|