Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> PerformanceEvaluation results


Copy link to this message
-
Re: PerformanceEvaluation results
Tim,

Here's the problem in a nutshell,
With respect to hardware, you have  5.4k rpms ? 6 drive and 8 cores?
Small slow drives, and still  a ratio less than one when you compare drives to spindles.

I appreciate that you want to maximize performance, but when it comes to tuning, you have to start before you get your hardware.

 You are asking a question about tuning, but how can we answer if the numbers are ok?
Have you looked at your GCs and implemented mslabs? We don't know. Network configuration?

I mean that there's a lot missing and fine tuning a cluster is something you have to do on your own. I guess I could say your numbers look fine to me for that config... But honestly, it would be a swag.
Sent from a remote device. Please excuse any typos...

Mike Segel

On Feb 1, 2012, at 7:09 AM, Tim Robertson <[EMAIL PROTECTED]> wrote:

> Thanks Michael,
>
> It's a small cluster, but is the hardware so bad?  We are particularly
> interested in relatively low load for random read write (2000
> transactions per second on <1k rows) but a decent full table scan
> speed, as we aim to mount Hive tables on HBase backed tables.
>
> Regarding tuning... not exactly sure which you would be interested in
> seeing.  The config is all here:
> http://code.google.com/p/gbif-common-resources/source/browse/#svn%2Fcluster-puppet%2Fmodules%2Fhadoop%2Ftemplates
>
> Cheers,
> Tim
>
>
>
> On Wed, Feb 1, 2012 at 1:56 PM, Michael Segel <[EMAIL PROTECTED]> wrote:
>> No.
>> What tuning did you do?
>> Why such a small cluster?
>>
>> Sorry, but when you start off with a bad hardware configuration, you can get Hadoop/HBase to work, but performance will always be sub-optimal.
>>
>>
>>
>> Sent from my iPhone
>>
>> On Feb 1, 2012, at 6:52 AM, "Tim Robertson" <[EMAIL PROTECTED]> wrote:
>>
>>> Hi all,
>>>
>>> We have a 3 node cluster (CD3u2) with the following hardware:
>>>
>>> RegionServers (+DN + TT)
>>>  CPU: 2x Intel(R) Xeon(R) CPU E5630 @ 2.53GHz (quad)
>>>  Disks: 6x250G SATA 5.4K
>>>  Memory: 24GB
>>>
>>> Master (+ZK, JT, NN)
>>>  CPU: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz, 2x6MB (quad)
>>>  Disks: 2x500G SATA 7.2K
>>>  Memory: 8GB
>>>
>>> Memory wise, we have:
>>> Master:
>>>  NN: 1GB
>>>  JT: 1GB
>>>  HBase master: 6GB
>>>  ZK: 1GB
>>> RegionServers:
>>>  RegionServer: 6GB
>>>  TaskTracker: 1GB
>>>  11 Mappers @ 1GB each
>>>  7 Reducers @ 1GB each
>>>
>>> HDFS was empty, and I ran randomWrite and scan both with number
>>> clients of 50 (seemed to spawn 500 Mappers though...)
>>>
>>> randomWrite:
>>> 12/02/01 13:27:47 INFO mapred.JobClient:     ROWS=52428500
>>> 12/02/01 13:27:47 INFO mapred.JobClient:     ELAPSED_TIME=84504886
>>>
>>> scan:
>>> 12/02/01 13:42:52 INFO mapred.JobClient:     ROWS=52428500
>>> 12/02/01 13:42:52 INFO mapred.JobClient:     ELAPSED_TIME=8158664
>>>
>>> Would I be correct in thinking that this is way below what is to be
>>> expected of this hardware?
>>> We're setting up ganglia now to start debugging, but any suggestions
>>> on how to diagnose this would be greatly appreciated.
>>>
>>> Thanks!
>>> Tim
>