Here's the problem in a nutshell,
With respect to hardware, you have 5.4k rpms ? 6 drive and 8 cores?
Small slow drives, and still a ratio less than one when you compare drives to spindles.
I appreciate that you want to maximize performance, but when it comes to tuning, you have to start before you get your hardware.
You are asking a question about tuning, but how can we answer if the numbers are ok?
Have you looked at your GCs and implemented mslabs? We don't know. Network configuration?
I mean that there's a lot missing and fine tuning a cluster is something you have to do on your own. I guess I could say your numbers look fine to me for that config... But honestly, it would be a swag.
Sent from a remote device. Please excuse any typos...
On Feb 1, 2012, at 7:09 AM, Tim Robertson <[EMAIL PROTECTED]> wrote:
> Thanks Michael,
> It's a small cluster, but is the hardware so bad? We are particularly
> interested in relatively low load for random read write (2000
> transactions per second on <1k rows) but a decent full table scan
> speed, as we aim to mount Hive tables on HBase backed tables.
> Regarding tuning... not exactly sure which you would be interested in
> seeing. The config is all here:
> On Wed, Feb 1, 2012 at 1:56 PM, Michael Segel <[EMAIL PROTECTED]> wrote:
>> What tuning did you do?
>> Why such a small cluster?
>> Sorry, but when you start off with a bad hardware configuration, you can get Hadoop/HBase to work, but performance will always be sub-optimal.
>> Sent from my iPhone
>> On Feb 1, 2012, at 6:52 AM, "Tim Robertson" <[EMAIL PROTECTED]> wrote:
>>> Hi all,
>>> We have a 3 node cluster (CD3u2) with the following hardware:
>>> RegionServers (+DN + TT)
>>> CPU: 2x Intel(R) Xeon(R) CPU E5630 @ 2.53GHz (quad)
>>> Disks: 6x250G SATA 5.4K
>>> Memory: 24GB
>>> Master (+ZK, JT, NN)
>>> CPU: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz, 2x6MB (quad)
>>> Disks: 2x500G SATA 7.2K
>>> Memory: 8GB
>>> Memory wise, we have:
>>> NN: 1GB
>>> JT: 1GB
>>> HBase master: 6GB
>>> ZK: 1GB
>>> RegionServer: 6GB
>>> TaskTracker: 1GB
>>> 11 Mappers @ 1GB each
>>> 7 Reducers @ 1GB each
>>> HDFS was empty, and I ran randomWrite and scan both with number
>>> clients of 50 (seemed to spawn 500 Mappers though...)
>>> 12/02/01 13:27:47 INFO mapred.JobClient: ROWS=52428500
>>> 12/02/01 13:27:47 INFO mapred.JobClient: ELAPSED_TIME=84504886
>>> 12/02/01 13:42:52 INFO mapred.JobClient: ROWS=52428500
>>> 12/02/01 13:42:52 INFO mapred.JobClient: ELAPSED_TIME=8158664
>>> Would I be correct in thinking that this is way below what is to be
>>> expected of this hardware?
>>> We're setting up ganglia now to start debugging, but any suggestions
>>> on how to diagnose this would be greatly appreciated.