|
|
-
Re: Typical hardware configurationsAndrew Purtell 2009-03-30, 00:11
Hello Amandeep, A basic rule of thumb is 1 core and 1 GB RAM per JVM. The Hadoop and HBase daemons will all need such an allocation. You can extend this to the mapreduce subsystem when considering how many mappers and/or reducers can concurrently execute on each node alongside the rest of what you are running. Or, you can choose to partition your hardware to support separate HDFS and HBase from the mapreduce task runners, as some do, which changes the situation. Lots of people try to run all-in-one clusters, where all functions are more or less co-located on every node. Strictly speaking, how much heap a TaskTracker map or reduce task child will require depends on the user application. But, it still loads the CPU so I still use the 1 CPU/1 GB rule of thumb even for these. Overload your CPU resources and the JVM scheduler will starve threads, introducing spurious heartbeat misses, timeouts, and recovery behaviors in system daemons that will unnecessarily degrade performance and operation. One thing I have considered but not tried is using Linux CPU affinity masks to put system functions in one partition and all user mapreduce tasks in the other. Another option as I mentioned is to split hardware resources among the functions. Here is what I have used in the past in a successful all-in- one deployment. In parentheses next to the Java process' name is the heap allocation reserved with -Xmx. 1: NameNode (2000) and DataNode (1000) 1: HMaster (1000), JobTracker (1000), and DataNode (1000) 23: DataNode (1000), HRegionServer (2000), TaskTracker (1000), and the concurrency limit for mappers and reducers set to 4 and 4, respectively. We picked a midpoint between cheap hardware and big iron. Our per node specs was dual quad core, 4/8 GB RAM, 6x 1TB disk. 2x1TB hosted the system volume in RAID-1 configuration. The remaining 4x1TB drives were attached as JBOD and used as DataNode data volumes. The rationale for using so much disk per node was maximization of cluster/rack density. As the size of your HDFS volume increases, you'll need to grow the heap allocation of your NameNode accordingly. In all my time running HBase I never needed more than 2GB allocated to it, but I hear that Facebook runs a NameNode with a 20GB heap. A word of warning however: Currently HBase is a very challenging user of HDFS. In 0.20 there are some changes (HFile) which lessens somewhat the number of open files and should also lower the total number of DataNode xceivers necessary to support operations. However on my 25 node cluster running Hadoop/HBase 0.19, I found it necessary to increase the DataNode xceiver limit to 4096 (from its default of 512!) to successfully bootstrap a HBase cluster with > 7000 regions. Therefore it may not be the per-node spec that is the determining factor for the stability of your cluster, but rather the number of DataNodes employed to sufficiently spread the load. Hope that helps, - Andy > From: Amandeep Khurana <[EMAIL PROTECTED]> > Subject: Typical hardware configurations > To: [EMAIL PROTECTED], [EMAIL PROTECTED] > Date: Friday, March 27, 2009, 10:07 PM > > What are the typical hardware config for a node that people > are using for Hadoop and HBase? I am setting up a new 10 node > cluster which will have HBase running as well that will be > feeding my front end directly. Currently, I had a 3 node > cluster with 2 GB of RAM on the slaves and 4 GB of RAM on the > master. This didnt work very well due to the RAM being a > little low. > > I got some config details from the powered by page on the > Hadoop wiki, but nothing like that for Hbase. > > Amandeep Khurana > Computer Science Graduate Student > University of California, Santa Cruz |