Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Heterogeneous cluster

Copy link to this message
Re: Heterogeneous cluster
Hi Mike,

I totally agree with you. The balancer I have done is more a hack than
a production version. I built my cluster by taking all the computers I
found around me. From 1 core P4 to 8 cores CPU. I have 8 nodes + 3 ZK.
They are all SO diffrent that "normal" load balancing was not very
efficient. They are all on the same rack (hub), so the "machine, rack,
second rack" distribution is not really working for me.

That's why I built this hack.

When I will have enought nodes to have 2 or 3 racks, I will most
probably go back to the DefaultLoadBalancer.

Just to give you an example, here is how one of my tables is now balanced:

Regions by Region Server
Region Server Region Count
http://node4:60030/ 11
http://phenom:60030/ 37
http://node5:60030/ 3
http://node2:60030/ 11
http://node3:60030/ 55
http://node1:60030/ 4
http://node6:60030/ 8

Also, can someone confirm the recommanded number of tasks per server?
I think I saw something like CPU * 0,7. Is that correct?


2012/12/8, Robert Dyer <[EMAIL PROTECTED]>:
> I of course can not speak for Jean-Marc, however my use case is not very
> corporate.  It is a small cluster (9 nodes) and only 1 of those nodes is
> different (drastically different).
> And yes, I configured it so that node has a lot more map slots.  However,
> the problem is HBase balances without regard to that and thus even though
> more map tasks run on those nodes they are not data-local!  If I have a
> balancer that is able to keep more regions on that particular node, then
> the data locality of my map tasks is improved.
> On Sat, Dec 8, 2012 at 5:45 PM, Michael Segel
>> Take what I say with a grain of kosher salt. (Its what they put on your
>> drink glasses because the grains are bigger. ;-)
>> I think what you are doing is cool hack, however in the bigger picture,
>> you shouldn't have to do this with your load balancer. Also it doesn't
>> matter if you think about ti.
>> With a heterogenous cluster, you will not share the same configuration
>> across all machines in the cluster. You will change the number of slots
>> per
>> node based on its capacity.
>> That will limit what amount of work could be done on the same cluster.
>> You could also consider playing with the rack aware aspects of your
>> cluster.
>> You could make all of your 2CPU machines in the same rack.
>> In theory... machine, rack , second rack is how the data is distributed.
>> In theory if the 2CPU cores are neighbors, then the 2nd and or 3rd copy
>> goes to another machine.
>> Trying to write a custom balancer, may be a good hack, but not good in
>> terms of corporate life.
>> Just saying!
>> -Mike
>> On Dec 8, 2012, at 1:34 PM, Jean-Marc Spaggiari <[EMAIL PROTECTED]>
>> wrote:
>> > Hi,
>> >
>> > It's not yet available anywhere. I will post it today or tomorrow,
>> > just the time to remove some hardcoding I did into it ;) It's a quick
>> > and dirty PerformanceBalancer. It's not a CPULoadBalencer.
>> >
>> > Anyway, I will give more details over the week-end, but there is
>> > absolutly nothing extraordinaire with it.
>> >
>> > JM
>> >
>> > 2012/12/8, Robert Dyer <[EMAIL PROTECTED]>:
>> >> I too am interested in this custom load balancer, as I was actually
>> >> just
>> >> starting to look into writing one that does the same thing for
>> >> my heterogeneous cluster!
>> >>
>> >> Is this available somewhere?
>> >>
>> >> On Sat, Dec 8, 2012 at 9:17 AM, James Chang <[EMAIL PROTECTED]>
>> >> wrote:
>> >>
>> >>>     By the way, I saw you mentioned that you
>> >>> have built a "LoadBalancer", could you kindly
>> >>> share some detailed info about it?
>> >>>
>> >>> Jean-Marc Spaggiari 於 2012年12月8日星期六寫道:
>> >>>
>> >>>> Hi,
>> >>>>
>> >>>> Here is the situation.
>> >>>>
>> >>>> I have an heterogeneous cluster with 2 cores CPUs, 4 cores CPUs and
>> >>>> 8
>> >>>> cores CPUs servers. The performances of those different servers
>> >>>> allow
>> >>>> them to handle different size of load. So far, I built a