Avi Vaknin 2011-08-21, 11:57
stanley.shi@... 2011-08-22, 01:46
Michel Segel 2011-08-22, 02:17
Allen Wittenauer 2011-08-22, 04:05
Avi Vaknin 2011-08-22, 09:55
אבי ווקנין 2011-08-22, 10:00
Ian Michael Gumby 2011-08-22, 13:34
-Re: Hadoop cluster optimization
Allen Wittenauer 2011-08-22, 16:19
On Aug 22, 2011, at 3:00 AM, אבי ווקנ��ן wrote:
> I assumed that the 1.7GB RAM will be the bottleneck in my environment that's
> why I am trying to change it now.
> I shut down the 4 datanodes with 1.7GB RAM (Amazon EC2 small instance) and
> replaced them with
> 2 datanodes with 7.5GB RAM (Amazon EC2 large instance).
This should allow you to bump up the memory and/or increase the task count.
> Is it OK that the datanodes are 64 bit while the namenode is still 32 bit?
I've run several instance where the NN was 64-bit and the DNs were 32-bit. I can't think of a reason the reverse wouldn't work. The only thing that is really going to matter is if they are the same CPU architecture. (which, if you are running on EC2, will likely always be the case).
> Based on the new hardware I'm using, Are there any suggestions regarding the
> Hadoop configuration parameters?
It really depends upon how much memory you need per task. Thus why task spill rate is important... :)
> One more thing, you asked: "Are your tasks spilling?"
> How can I check if my tasks spilling ?
Check the task logs.
If you aren't spilling, then you'll likely want to match task count=core count-1 unless mem is exhausted first. (i.e., tasks*mem should be < avail mem).