Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: Which hardware to choose


Copy link to this message
-
Re: Which hardware to choose
Ah that's the $64,000.00 (USD) question....

I tend to be conservative so this should be a good starting point.

You start with 2 things... the amount of memory available and the number of physical cores.

Subtract a core for each main process. e.g. DN, TT, and RS if you're running HBase.
Take the remaining cores and if you're running on INTEL w HyperThreading multiply them by 2.
That's the max number of slots you should use when configuring Hadoop.

Note: For each slot, you should have at least 1GB of memory.
You may want to plan on 2GB so your child opts can go up to 2GB before reducing the number of slots.

So if you have dual hexa-core and run HBase...  it looks like the following:
12 cores less DN, TT, and RS = 9 cores. * 2 so you have 18 slots that can be a mix of Mappers and Reducers.

That's a good starting position and you can ramp it up based on what you observe.

YMMV of course.

Note: When I run HBase, I don't want any swapping. So you have to pay attention to the amount of memory on the system and how its being allocated.

:-)
On Oct 2, 2012, at 8:57 PM, Marcos Ortiz <[EMAIL PROTECTED]> wrote:

> Which is a reasonable number in this hardware?
>
> On 10/02/2012 09:40 PM, Michael Segel wrote:
>> I think he's saying that its 24 maps 8 reducers per node and at 48GB that could be too many mappers.
>> Especially if they want to run HBase.
>>
>> On Oct 2, 2012, at 8:14 PM, hadoopman <[EMAIL PROTECTED]> wrote:
>>
>>> Only 24 map and 8 reduce tasks for 38 data nodes?  are you sure that's right?  Sounds VERY low for a cluster that size.
>>>
>>> We have only 10 c2100's and are running I believe 140 map and 70 reduce slots so far with pretty decent performance.
>>>
>>>
>>>
>>> On 10/02/2012 12:55 PM, Alexander Pivovarov wrote:
>>>> 38 data nodes + 2 Name Nodes
>>>>>  >
>>>>>  >  Data Node:
>>>>>  >  Dell PowerEdge C2100 series
>>>>>  >  2 x XEON x5670
>>>>>  >  48 GB RAM ECC  (12x4GB 1333MHz)
>>>>>  >  12 x 2 TB  7200 RPM SATA HDD (with hot swap)  JBOD
>>>>>  >  Intel Gigabit ET Dual port PCIe x4
>>>>>  >  Redundant Power Supply
>>>>>  >  Hadoop CDH3
>>>>>  >  max map tasks 24
>>>>>  >  max reduce tasks 8
>>>
>>
>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>
>> http://www.uci.cu
>> http://www.facebook.com/universidad.uci
>> http://www.flickr.com/photos/universidad_uci
>
> --
> Marcos Luis Ortíz Valmaseda
> Data Engineer && Sr. System Administrator at UCI
> about.me/marcosortiz
> My Blog
> Tumblr's blog
> @marcosluis2186
>
>  
>

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB