On Wed, Jul 14, 2010 at 6:24 AM, S Ahmed <[EMAIL PROTECTED]> wrote: > Is there a reason why 5 is the recommend number of servers (minimum) in a > cluster? > > Why not 2 or 3? > > Just asking because 5 large ec2 instances (7.5gb ram) isn't *that* cheap :) > > Thanks! >
Most of the answer is tied into the architecture of Hadoop. Normally most set dfs.replication to 3. You are not going to want your namenode to run on the same physical hardware as your DataNode, so that is already 4. You may want a dedicated zookeeper and hbase master so that is 5.
However performance wise if you dfs.replication = 3, at 3 nodes you do not have that 'critical mass' of servers for the scale out effect. At replication 3 and number of nodes 3 every action (put get) has some affect on all servers. If you have replication 3 and 10 nodes a single put or get only roughly effects 30% of your cluster. 3/100 3%...and so on
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext