Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Can we replace namenode machine with some other machine ?


Copy link to this message
-
RE: Can we replace namenode machine with some other machine ?

I agree w Steve except on one thing...

RAID 5 Bad. RAID 10 (1+0) good.

Sorry this goes back to my RDBMs days where RAID 5 will kill your performance and worse...

> Date: Thu, 22 Sep 2011 11:28:39 +0100
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
> Subject: Re: Can we replace namenode machine with some other machine ?
>
> On 22/09/11 05:42, praveenesh kumar wrote:
> > Hi all,
> >
> > Can we replace our namenode machine later with some other machine. ?
> > Actually I got a new  server machine in my cluster and now I want to make
> > this machine as my new namenode and jobtracker node ?
> > Also Does Namenode/JobTracker machine's configuration needs to be better
> > than datanodes/tasktracker's ??
> >
>
> 1. I'd give it lots of RAM - holding data about many files, avoiding
> swapping, etc.
>
> 2. I'd make sure the disks are RAID5, with some NFS-mounted FS that the
> secondary namenode can talk to. avoids risk of loss of the index, which,
> if it happens, renders your filesystem worthless. If I was really
> paranoid I'd have twin raid controllers with separate connections to
> disk arrays in separate racks, as [Jiang2008] shows that interconnect
> problems on disk arrays can be higher than HDD failures.
>
> 3. if your central switches are at 10 GbE, consider getting a 10GbE NIC
> and hooking it up directly -this stops the network being the bottleneck,
> though it does mean the server can have a lot more packets hitting it,
> so putting more load on it.
>
> 4. Leave space for a second CPU and time for GC tuning.
>
>
> JT's are less important; they need RAM but use HDFS for storage. If your
> cluster is small, NN and JT can be run locally. If you do this, set up
> DNS to have two hostnames to point to same network address. Then if you
> ever split them off, everyone whose bookmark says http://jobtracker
> won't notice
>
> Either way: the NN and the JT are the machines whose availability you
> care about. The rest is just a source of statistics you can look at later.
>
> -Steve
>
>
>
> [Jiang2008] "Are disks the dominant contributor for storage failures?: A
> comprehensive study of storage subsystem failure characteristics". ACM
> Transactions on Storage.
>
     
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB