Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: assign tasks to specific nodes


+
Ravi Prakash 2013-09-09, 18:17
+
Mark Olimpiati 2013-09-11, 23:08
Copy link to this message
-
Re: assign tasks to specific nodes
Potentially you would be able to but I guess you will have to update the
partitioning code and correspondingly RMContainerAllocator (YARN-map
reduce) code. Today we have same priority for all map task < same priority
for all reduce task. What you can do is to change the MAP task priorities
based on partition size (file size).  Make sure when you are assigning
priorities to container request
priorities for containers for corresponding map tasks
apartment > room > villa....

However you should notice few things here..plus I have few questions for
you..
1) I don't see why you want to do this but for your task to succeed you
will need all the of the map tasks to finish.. why you want this ordering??
any benefits?
2) Even if you submit all the requests with specified priorities you are
not guaranteed to get them in same order because most of these requests are
for specific host machines (node managers) so we don't know in advance
whether sufficient resources will be available there or not.

Thanks,
Omkar Joshi
*Hortonworks Inc.* <http://www.hortonworks.com>
On Wed, Sep 11, 2013 at 4:08 PM, Mark Olimpiati <[EMAIL PROTECTED]> wrote:

> Hi Vinod, I had the node assignment at first but in my second email I
> explained how I want to change the order of data partition execution. The
> default is run tasks based on the *size *of the assigned partition to it.
> Now I want to run tasks such that specific order of partitions is to be
> executed.
>
> Eg. First assume input is directory Houses/ with files {Villa, Apartment,
> Room} such that file "Villa" is larger in size than "Apartments" than
> "Room".
>
> The default hadoop would run :
> map1 --> Villa
> map2 --> Apartment
> map3 --> Room
>
> I want to assign priorities to the *data partitions* such that
> Apartment=1, Room=2, Villa=3 then the scheduler will run the following in
> this order:
> map1 --> Apartment
> map2 --> Room
> map3 --> Villa
>
> My question is that possible? Notice this is regardless of the assigned
> node.
> Thank you,
> Mark
>
>
> On Wed, Sep 11, 2013 at 10:45 AM, Vinod Kumar Vavilapalli <
> [EMAIL PROTECTED]> wrote:
>
>>
>> I assume you are talking about MapReduce. And 1.x release or 2.x?
>>
>> In either of the releases, this cannot be done directly.
>>
>> In 1.x, the framework doesn't expose a feature like this as it is a
>> shared service, and if enough jobs flock to a node, it will lead to
>> utilization and failure handling issues.
>>
>> In Hadoop 2 YARN, the platform does expose this functionality. But
>> MapReduce framework doesn't yet expose this functionality to the end users.
>>
>> What exactly is your use case? Why are some nodes of higher priority than
>> others?
>>
>>  Thanks,
>> +Vinod Kumar Vavilapalli
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>> On Sep 11, 2013, at 10:09 AM, Mark Olimpiati wrote:
>>
>> Thanks for replying Rev, but the link is talking about reducers which
>> seems to be like a similar case but what if I assigned priorities to the
>> data partitions (eg. partition B=1, partition C=2, partition A=3,...) such
>> that first map task is assigned partition B to run first. Then second map
>> is given partition C, .. etc. This is instead of assigning based on
>> partition size. Is that possible?
>>
>> Thanks,
>> Mark
>>
>>
>> On Mon, Sep 9, 2013 at 11:17 AM, Ravi Prakash <[EMAIL PROTECTED]> wrote:
>>
>>>
>>> http://lucene.472066.n3.nabble.com/Assigning-reduce-tasks-to-specific-nodes-td4022832.html
>>>
>>>   ------------------------------
>>>  *From:* Mark Olimpiati <[EMAIL PROTECTED]>
>>> *To:* [EMAIL PROTECTED]
>>> *Sent:* Friday, September 6, 2013 1:47 PM
>>> *Subject:* assign tasks to specific nodes
>>>
>>> Hi guys,
>>>
>>>    I'm wondering if there is a way for me to assign tasks to specific
>>> machines or at least assign priorities to the tasks to be executed in that
>>> order. Any suggestions?
>>>
>>> Thanks,
>>> Mark
>>>
>>>
>>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB