Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: Assigning reduce tasks to specific nodes


+
Michael Segel 2012-12-01, 17:57
+
Jean-Marc Spaggiari 2012-12-07, 19:51
+
Tsuyoshi OZAWA 2012-12-08, 07:59
Copy link to this message
-
Re: Assigning reduce tasks to specific nodes
Thank you all for the comments.

>you ought to make sure your scheduler also does non-strict scheduling of data local tasks for jobs
that don't require such strictness

I just want to make sure one thing.
If I write my own scheduler, is it possible to do "strict" scheduling ?

Thanks

On Thu, Nov 29, 2012 at 1:56 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote:
> Look at locality delay parameter
>
> Sent from my iPhone
>
> On Nov 28, 2012, at 8:44 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>
>> None of the current schedulers are "strict" in the sense of "do not
>> schedule the task if such a tasktracker is not available". That has
>> never been a requirement for Map/Reduce programs and nor should be.
>>
>> I feel if you want some code to run individually on all nodes for
>> whatever reason, you may as well ssh into each one and start it
>> manually with appropriate host-based parameters, etc.. and then
>> aggregate their results.
>>
>> Note that even if you get down to writing a scheduler for this (which
>> I don't think is a good idea, but anyway), you ought to make sure your
>> scheduler also does non-strict scheduling of data local tasks for jobs
>> that don't require such strictness - in order for them to complete
>> quickly than wait around for scheduling in a fixed manner.
>>
>> On Thu, Nov 29, 2012 at 6:00 AM, Hiroyuki Yamada <[EMAIL PROTECTED]> wrote:
>>> Thank you all for the comments and advices.
>>>
>>> I know it is not recommended to assigning mapper locations by myself.
>>> But There needs to be one mapper running in each node in some cases,
>>> so I need a strict way to do it.
>>>
>>> So, locations is taken care of by JobTracker(scheduler), but it is not strict.
>>> And, the only way to do it strictly is making a own scheduler, right ?
>>>
>>> I have checked the source and I am not sure where to modify to do it.
>>> What I understand is FairScheduler and others are for scheduling
>>> multiple jobs. Is this right ?
>>> What I want to do is scheduling tasks in one job.
>>> This can be achieved by FairScheduler and others ?
>>>
>>> Regards,
>>> Hiroyuki
>>>
>>> On Thu, Nov 29, 2012 at 12:46 AM, Michael Segel
>>> <[EMAIL PROTECTED]> wrote:
>>>> Mappers? Uhm... yes you can do it.
>>>> Yes it is non-trivial.
>>>> Yes, it is not recommended.
>>>>
>>>> I think we talk a bit about this in an InfoQ article written by Boris
>>>> Lublinsky.
>>>>
>>>> Its kind of wild when your entire cluster map goes red in ganglia... :-)
>>>>
>>>>
>>>> On Nov 28, 2012, at 2:41 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Mapper scheduling is indeed influenced by the getLocations() returned
>>>> results of the InputSplit.
>>>>
>>>> The map task itself does not care about deserializing the location
>>>> information, as it is of no use to it. The location information is vital to
>>>> the scheduler (or in 0.20.2, the JobTracker), where it is sent to directly
>>>> when a job is submitted. The locations are used pretty well here.
>>>>
>>>> You should be able to control (or rather, influence) mapper placement by
>>>> working with the InputSplits, but not strictly so, cause in the end its up
>>>> to your MR scheduler to do data local or non data local assignments.
>>>>
>>>>
>>>> On Wed, Nov 28, 2012 at 11:39 AM, Hiroyuki Yamada <[EMAIL PROTECTED]>
>>>> wrote:
>>>>>
>>>>> Hi Harsh,
>>>>>
>>>>> Thank you for the information.
>>>>> I understand the current circumstances.
>>>>>
>>>>> How about for mappers ?
>>>>> As far as I tested, location information in InputSplit is ignored in
>>>>> 0.20.2,
>>>>> so there seems no easy way for assigning mappers to specific nodes.
>>>>> (I before checked the source and noticed that
>>>>> location information is not restored when deserializing the InputSplit
>>>>> instance.)
>>>>>
>>>>> Thanks,
>>>>> Hiroyuki
>>>>>
>>>>> On Wed, Nov 28, 2012 at 2:08 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>>>>>> This is not supported/available currently even in MR2, but take a look
>>>>>> at
>>>>>> https://issues.apache.org/jira/browse/MAPREDUCE-199.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB