Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Re: Which hardware to choose


Copy link to this message
-
Re: Which hardware to choose
Michael Segel 2012-10-03, 17:21
Well...

If you're not running HBase, you're less harmed by minimal swapping so you could push the number of slots and over subscribe.
The only thing I would have to suggest is that you monitor your system closely as you adjust the number of slots.

You have to admit though, its fun to tune the cluster. :-)

On Oct 3, 2012, at 12:09 PM, J. Rottinghuis <[EMAIL PROTECTED]> wrote:

> Of course it all depends...
> But something like this could work:
>
> Leave 1-2 GB for the kernel, pagecache, tools, overhead etc.
> Plan 3-4 GB for Datanode and Tasktracker each
>
> Plan 2.5-3 GB per slot. Depending on the kinds of jobs, you may need more
> or less memory per slot.
> Have 2-3 times as many mappers as reducers (depending on the kinds of jobs
> you run).
>
> As Micheal pointed out the ratio of cores (hyperthreads) per disk matters.
>
> With those initial rules of thumb you'd arrive somewhere between
> 10 mappers + 5 reducers
> and
> 9 mappers + 4 reducers
>
> Try, test, measure, adjust, rinse, repeat.
>
> Cheers,
>
> Joep
>
> On Tue, Oct 2, 2012 at 8:42 PM, Alexander Pivovarov <[EMAIL PROTECTED]>wrote:
>
>> All configs are per node.
>> No HBase, only Hive and Pig installed
>>
>> On Tue, Oct 2, 2012 at 9:40 PM, Michael Segel <[EMAIL PROTECTED]
>>> wrote:
>>
>>> I think he's saying that its 24 maps 8 reducers per node and at 48GB that
>>> could be too many mappers.
>>> Especially if they want to run HBase.
>>>
>>> On Oct 2, 2012, at 8:14 PM, hadoopman <[EMAIL PROTECTED]> wrote:
>>>
>>>> Only 24 map and 8 reduce tasks for 38 data nodes?  are you sure that's
>>> right?  Sounds VERY low for a cluster that size.
>>>>
>>>> We have only 10 c2100's and are running I believe 140 map and 70 reduce
>>> slots so far with pretty decent performance.
>>>>
>>>>
>>>>
>>>> On 10/02/2012 12:55 PM, Alexander Pivovarov wrote:
>>>>> 38 data nodes + 2 Name Nodes
>>>>>>>
>>>>>>> Data Node:
>>>>>>> Dell PowerEdge C2100 series
>>>>>>> 2 x XEON x5670
>>>>>>> 48 GB RAM ECC  (12x4GB 1333MHz)
>>>>>>> 12 x 2 TB  7200 RPM SATA HDD (with hot swap)  JBOD
>>>>>>> Intel Gigabit ET Dual port PCIe x4
>>>>>>> Redundant Power Supply
>>>>>>> Hadoop CDH3
>>>>>>> max map tasks 24
>>>>>>> max reduce tasks 8
>>>>
>>>>
>>>
>>>
>>