Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - DataNode Hardware


Copy link to this message
-
Re: DataNode Hardware
Amandeep Khurana 2012-07-12, 20:26
The issue with having lower cores per box is that you are collocating datanode, region servers, task trackers and then the MR tasks themselves too. Plus you need a core for the OS too. These are things that need to run on a single node, so you need a minimum amount of resources that can handle all of this well. I don't see how you will be able to do compute heavy stuff in 4 cores even if you give 1 to the OS, 1 to datanodes and task tracker processes and 1 to the region server. You are left with only 1 core for the actual tasks to run.

Also, if you really want low latency access to data in a reliable manner, I would separate out the MR framework onto an independent cluster and put HBase on an independent cluster. The MR framework will talk to the HBase cluster for look ups though. You'll still benefit from the caching etc but HBase will be able to guarantee performance better.

-Amandeep
On Thursday, July 12, 2012 at 1:20 PM, Bartosz M. Frak wrote:

> Amandeep Khurana wrote:
> > Inline.
> >
> >
> > On Thursday, July 12, 2012 at 12:56 PM, Bartosz M. Frak wrote:
> >
> > > Quick question about data node hadrware. I've read a few articles, which
> > > cover the basics, including the Cloudera's recommendations here:
> > > http://www.cloudera.com/blog/2010/03/clouderas-support-team-shares-some-basic-hardware-recommendations/
> > >
> > > The article is from early 2010, but I'm assuming that the general
> > > guidelines haven't deviated much from the recommended baselines. I'm
> > > skewing my build towards the "Compute optimized" side of the spectrum,
> > > which calls for a a 1:1 core to spindle model and more RAM for per node
> > > for in-memory caching.
> > >
> > >
> >
> > Why are you skewing more towards compute optimized. Are you expecting to run compute intensive MR interacting with HBase tables?
> >
> >
>
> Correct. We'll storing dense raw numerical time-based data, which will
> need to be transformed (decimated, FFTed, correlated, etc) with
> relatively low latency (under 10 seconds). We also expect repeatable
> reads, where the same piece of data is "looked" at more than once in a
> short amount of time. This is where we are hoping that in-memory caching
> and data node affinity can help us.
> > > Other important consideration is low(ish) power
> > > consumption. With that in mind I had specced out the following (per node):
> > >
> > > Chassis: 1U Supermicro chassis with 2x 1Gb/sec ethernet ports
> > > (http://www.supermicro.com/products/system/1u/5017/sys-5017c-mtf.cfm)
> > > (~500USD)
> > > Memory: 32GB Unbuffered ECC RAM (~280USD)
> > > Disks: 4x2TBHitachi Ultrastar 7200RPM SAS Drives (~960USD)
> > >
> > >
> >
> > You can use plain SATA. Don't need SAS.
> >
> >
>
> This is a government sponsored project, so some requirements (like MTBF
> and spindle warranty) for are "set in stone", but I'll look into that.
> > > CPU: 1x Intel E3-1230-v2 (3.3Ghz 4 Core / 8 Thread 69W) (~240USD)
> > >
> > >
> >
> > Consider getting dual hex core CPUs.
> >
> >
>
> I'm trying to avoid that for two reasons. Dual socket boards are (1)
> more expensive and (2) power hungry. Additionally the CPUs for those
> boards are also more expensive and less efficient than the one socket
> counterparts (take a look at Intel's E3 and E5 line pricing). The
> guidelines from the quited article state:
>
> "Compute Intensive Configuration (2U/machine): Two quad core CPUs,
> 48-72GB memory, and 8 disk drives (1TB or 2TB). These are often used
> when a combination of large in-memory models and heavy reference data
> caching is required."
>
> My two 1U machines, which are equivalent to this remediations have 8
> (very fast, low wattage) cores, 64GB RAM and 8 2TB disks.
>
> > > The backplane will consist of a dedicated high powered switch (not sure
> > > which one yet) with each node utilizing link aggregation.
> > >
> > > Does this look reasonable? We are looking into buying 4-5 of those for
> > > our initial test bench for under $10000 and plan to expand to about