Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Hadoop cluster optimization


Copy link to this message
-
Re: Hadoop cluster optimization

On Aug 21, 2011, at 7:17 PM, Michel Segel wrote:

> Avi,
> First why 32 bit OS?
> You have a 64 bit processor that has 4 cores hyper threaded looks like 8cpus.

With only 1.7gb of mem, there likely isn't much of a reason to use a 64-bit OS.  The machines (as you point out) are already tight on memory.  64-bit is only going to make it worse.

>>
>> 1.7 GB memory
>> 1 Intel(R) Xeon(R) CPU E5507 @ 2.27GHz
>> Ubuntu Server 10.10 , 32-bit platform
>> Cloudera CDH3 Manual Hadoop Installation
>> (for the ones who are familiar with Amazon Web Services, I am talking about
>> Small EC2 Instances/Servers)
>>
>> Total job run time is +-15 minutes (+-50 files/blocks/mapTasks of up to 250
>> MB and 10 reduce tasks).
>>
>> Based on the above information, does anyone can recommend on a best practice
>> configuration??

How many spindles?  Are your tasks spilling?
>> Do you thinks that when dealing with such a small cluster, and when
>> processing such a small amount of data,
>> is it even possible to optimize jobs so they would run much faster?

Most of the time, performance issues are with the algorithm, not Hadoop.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB