Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - DataNode Hardware


+
Bartosz M. Frak 2012-07-12, 19:56
+
Amandeep Khurana 2012-07-12, 20:00
+
Bartosz M. Frak 2012-07-12, 20:20
+
Amandeep Khurana 2012-07-12, 20:26
+
Bartosz M. Frak 2012-07-12, 21:00
Copy link to this message
-
Re: DataNode Hardware
Michael Segel 2012-07-12, 23:22
Uhm... I'd take a step back...
> Thanks for the reply. I didn't realized that all the non-MR tasks were this CPU bound; plus my naive assumption was that four spindles will have a hard time supplying data to MR fast enough for it to become bogged down.
Your gut feel is correct.

If you go w 12 cores in a 1U box and 4 drives, you will be disk i/o bound and you will end up watching wait CPU cycles increase.
On a 1 U box, 8 cores would be a bit better balance. Maybe go w 2.5" drives and more spindles.

If you don't run HBase, 4GB per core is ok just for map/reduce.  You will want more memory for Hbase.
8 cores 32GB for M/R ok.... Hbase, 48GB better.
On Jul 12, 2012, at 4:00 PM, Bartosz M. Frak wrote:

> Amandeep Khurana wrote:
>> The issue with having lower cores per box is that you are collocating datanode, region servers, task trackers and then the MR tasks themselves too. Plus you need a core for the OS too. These are things that need to run on a single node, so you need a minimum amount of resources that can handle all of this well. I don't see how you will be able to do compute heavy stuff in 4 cores even if you give 1 to the OS, 1 to datanodes and task tracker processes and 1 to the region server. You are left with only 1 core for the actual tasks to run.
>> Also, if you really want low latency access to data in a reliable manner, I would separate out the MR framework onto an independent cluster and put HBase on an independent cluster. The MR framework will talk to the HBase cluster for look ups though. You'll still benefit from the caching etc but HBase will be able to guarantee performance better.
>>
>> -Amandeep
>>
>>  
> Thanks for the reply. I didn't realized that all the non-MR tasks were this CPU bound; plus my naive assumption was that four spindles will have a hard time supplying data to MR fast enough for it to become bogged down.
>
>> On Thursday, July 12, 2012 at 1:20 PM, Bartosz M. Frak wrote:
>>
>>  
>>> Amandeep Khurana wrote:
>>>    
>>>> Inline.
>>>>
>>>> On Thursday, July 12, 2012 at 12:56 PM, Bartosz M. Frak wrote:
>>>>
>>>>      
>>>>> Quick question about data node hadrware. I've read a few articles, which cover the basics, including the Cloudera's recommendations here:
>>>>> http://www.cloudera.com/blog/2010/03/clouderas-support-team-shares-some-basic-hardware-recommendations/
>>>>>
>>>>> The article is from early 2010, but I'm assuming that the general guidelines haven't deviated much from the recommended baselines. I'm skewing my build towards the "Compute optimized" side of the spectrum, which calls for a a 1:1 core to spindle model and more RAM for per node for in-memory caching.
>>>>>
>>>>>
>>>>>        
>>>> Why are you skewing more towards compute optimized. Are you expecting to run compute intensive MR interacting with HBase tables?
>>>>
>>>>      
>>> Correct. We'll storing dense raw numerical time-based data, which will need to be transformed (decimated, FFTed, correlated, etc) with relatively low latency (under 10 seconds). We also expect repeatable reads, where the same piece of data is "looked" at more than once in a short amount of time. This is where we are hoping that in-memory caching and data node affinity can help us.
>>>    
>>>>> Other important consideration is low(ish) power consumption. With that in mind I had specced out the following (per node):
>>>>>
>>>>> Chassis: 1U Supermicro chassis with 2x 1Gb/sec ethernet ports (http://www.supermicro.com/products/system/1u/5017/sys-5017c-mtf.cfm) (~500USD)
>>>>> Memory: 32GB Unbuffered ECC RAM (~280USD)
>>>>> Disks: 4x2TBHitachi Ultrastar 7200RPM SAS Drives (~960USD)
>>>>>
>>>>>
>>>>>        
>>>> You can use plain SATA. Don't need SAS.
>>>>
>>>>      
>>> This is a government sponsored project, so some requirements (like MTBF and spindle warranty) for are "set in stone", but I'll look into that.
>>>    
>>>>> CPU: 1x Intel E3-1230-v2 (3.3Ghz 4 Core / 8 Thread 69W) (~240USD)
>>>>>
>>>>>
>>>>>        
>>>> Consider getting dual hex core CPUs.