Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Heterogeneous cluster


+
Jean-Marc Spaggiari 2012-12-08, 03:32
+
Asaf Mesika 2012-12-09, 10:08
+
James Chang 2012-12-08, 15:17
+
Robert Dyer 2012-12-08, 18:38
Copy link to this message
-
Re: Heterogeneous cluster
Hi,

It's not yet available anywhere. I will post it today or tomorrow,
just the time to remove some hardcoding I did into it ;) It's a quick
and dirty PerformanceBalancer. It's not a CPULoadBalencer.

Anyway, I will give more details over the week-end, but there is
absolutly nothing extraordinaire with it.

JM

2012/12/8, Robert Dyer <[EMAIL PROTECTED]>:
> I too am interested in this custom load balancer, as I was actually just
> starting to look into writing one that does the same thing for
> my heterogeneous cluster!
>
> Is this available somewhere?
>
> On Sat, Dec 8, 2012 at 9:17 AM, James Chang <[EMAIL PROTECTED]>
> wrote:
>
>>      By the way, I saw you mentioned that you
>> have built a "LoadBalancer", could you kindly
>> share some detailed info about it?
>>
>> Jean-Marc Spaggiari 於 2012年12月8日星期六寫道:
>>
>> > Hi,
>> >
>> > Here is the situation.
>> >
>> > I have an heterogeneous cluster with 2 cores CPUs, 4 cores CPUs and 8
>> > cores CPUs servers. The performances of those different servers allow
>> > them to handle different size of load. So far, I built a LoadBalancer
>> > which balance the regions over those servers based on the
>> > performances. And it’s working quite well. The RowCounter went down
>> > from 11 minutes to 6 minutes. However, I can still see that the tasks
>> > are run on some servers accessing data on other servers, which
>> > overwhelme the bandwidth and slow done the process since some 2 cores
>> > servers are assigned to count some rows hosted on 8 cores servers.
>> >
>> > I’m looking for a way to “force” the tasks to run on the servers where
>> > the regions are assigned.
>> >
>> > I first tried to reject the tasks on the Mapper setup method when the
>> > data was not local to see if the tracker will assign it to another
>> > server. No. It’s just failing and mostly not re-assigned. I tried
>> > IOExceptions, RuntimeExceptions, InterruptionExceptions with no
>> > success.
>> >
>> > So now I have 3 possible options.
>> >
>> > The first one is to move from the MapReduce to the Coprocessor
>> > EndPoint. Running locally on the RegionServer, it’s accessing only the
>> > local data and I can manually reject all what is not local. Therefor
>> > it’s achieving my needs, but it’s not my preferred options since I
>> > would like to keep the MR features.
>> >
>> > The second option is to tell Hadoop where the tasks should be
>> > assigned. Should that be done by HBase? By Hadoop? I don’t know.
>> > Where? I don’t know either. I have started to look at JobTracker and
>> > JobInProgress code but it seems it will be a big task. Also, doing
>> > that will mean I will have to re-patch the distributed code each time
>> > I’m upgrading the version, and I will have to redo everything when I
>> > will move from 1.0.x to 2.x…
>> >
>> > Third option is to not process the task if the data is not local. I
>> > mean, on the map method, simply have a if (!local) return; right from
>> > the beginning and just do nothing. This will not work for things like
>> > RowCount since all the entries are required, but for some of my
>> > usecases this might work where I don’t necessary need all the data to
>> > be processed. I will not be efficient stlil the task will still scan
>> > the entire region.
>> >
>> > My preferred option is definitively the 2nd one, but it seems also to
>> > be the most difficult one. The Third one is very easy to implement.
>> > Need 2 lines to see if the data is local. But it’s not working for all
>> > the scenarios, and is more like a dirty fix. The coprocessor option
>> > might be doable too since I already have all the code for my MapReduce
>> > jobs. So it might be an acceptable option.
>> >
>> > I’m wondering if anyone already faced this situation and worked on
>> > something, and if not, do you have any other ideas/options to propose,
>> > or can someone point me to the right classes to look at to implement
>> > the solution 2?
>> >
>> > Thanks,
>> >
>> > JM
>> >
>>
>
>
>
> --
>
> Robert Dyer
+
Michael Segel 2012-12-08, 23:45
+
Robert Dyer 2012-12-08, 23:50
+
Michael Segel 2012-12-09, 08:27
+
Jean-Marc Spaggiari 2012-12-10, 14:03
+
Anoop Sam John 2012-12-11, 04:04
+
Jean-Marc Spaggiari 2012-12-11, 18:48
+
Harsh J 2012-12-11, 20:20
+
Anoop Sam John 2012-12-12, 03:54
+
Jean-Marc Spaggiari 2012-12-09, 02:36
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB