Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS, mail # user - Re: Fair Scheduler is not Fair why?


+
Justin Workman 2013-01-16, 15:45
+
Nan Zhu 2013-01-16, 16:21
+
Dhanasekaran Anbalagan 2013-01-16, 18:07
Copy link to this message
-
Re: Fair Scheduler is not Fair why?
Jeff Bean 2013-01-17, 18:35
Hi Dhanasekaran,

The issue is not with Hadoop streaming. You can try this yourself:

On your local disk, touch a bunch of files, like this:

mkdir stream
cd stream
touch 1 2 3 4 5 6 7 8 9 9 10

Then, put the files into HDFS:

hadoop fs -put stream stream

Now, put a unix sleep command into a shell script:

echo sleep 10 > sleepten.sh

Now you have all the ingredients you need to submit a hadoop streaming
sleep job to test the scheduler.

Submit sleepten.sh as a mapper, input directory stream, hadoop streaming
will launch ten mappers (one per file).

Here's what I did:

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -D
mypool=research -input stream -output bar -mapper ./sleepten.sh -file
./sleepten.sh

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -D mypool=tech
-input stream -output baz -mapper ./sleepten.sh -file ./sleepten.sh

I have my cluster configured with the poolname property set is "mypool".
This launches two jobs, 10 mappers each and the scheduler evenly fairly
divides the tasks between research and tech.

Jeff

On Wed, Jan 16, 2013 at 10:07 AM, Dhanasekaran Anbalagan <[EMAIL PROTECTED]
> wrote:

> HI Jeff,
>
> thanks for kindly mail, I have tested sleep job working pretty good. But
> we have tested with Hadoop streaming job not proper with fair
> scheduling Algorithm why?.  Any other way test Hadoop streaming job, with
> fair scheduler
>
> Note:
> Tested with RHadoop with rmr.
>
> -Dhanasekaran.
>
> Did I learn something today? If not, I wasted it.
>
>
> On Wed, Jan 16, 2013 at 12:02 PM, Jeff Bean <[EMAIL PROTECTED]> wrote:
>
>> Validate your scheduler capacity and behavior by using sleep jobs. Submit
>> sleep jobs to the pools that mirror your production jobs and just check
>> that the scheduler pool allocation behaves as you expect. The nice thing
>> about sleep is that you can mimic your real jobs: numbers of tasks and how
>> long they run.
>>
>> You should be able to determine that the hypothesis posed on this thread
>> is correct: that all the slots are taken by other tasks. Indeed, your UI
>> says that research has 90 running tasks after having completed over 4000,
>> but your emails says no tasks are scheduled. I'm a little confused.
>>
>> Jeff
>>
>>
>> On Wed, Jan 16, 2013 at 8:50 AM, Nan Zhu <[EMAIL PROTECTED]> wrote:
>>
>>> BTW, what I mentioned is fairsharepreemption  not minimum share
>>>
>>> an alternative way to achieve that is to set minimum share of two queues
>>> to be equal(or other allocation scheme you like), and sum of them is equal
>>> to the capacity of the cluster, and enable minimumSharePreemption
>>>
>>> Good Luck!
>>>
>>> Best,
>>>
>>> --
>>> Nan Zhu
>>> School of Computer Science,
>>> McGill University
>>>
>>>
>>> On Wednesday, 16 January, 2013 at 11:43 AM, Nan Zhu wrote:
>>>
>>>  I think you should do that, so that when the allocation is
>>> inconsistent with fair share, the tasks in the queue which occupies more
>>> beyond it's fair share will be killed, and the available slots would be
>>> assigned to the other one (assuming the weights of them are the same)
>>>
>>> Best,
>>>
>>> --
>>> Nan Zhu
>>> School of Computer Science,
>>> McGill University
>>>
>>>
>>> On Wednesday, 16 January, 2013 at 11:32 AM, Dhanasekaran Anbalagan wrote:
>>>
>>> HI Nan,
>>>
>>> We have not enabled Fair Scheduler Preemption.
>>>
>>> -Dhanasekaran.
>>>
>>> Did I learn something today? If not, I wasted it.
>>>
>>>
>>> On Wed, Jan 16, 2013 at 11:21 AM, Nan Zhu <[EMAIL PROTECTED]> wrote:
>>>
>>>  have you enabled task preemption?
>>>
>>> Best,
>>>
>>> --
>>> Nan Zhu
>>> School of Computer Science,
>>> McGill University
>>>
>>>
>>> On Wednesday, 16 January, 2013 at 10:45 AM, Justin Workman wrote:
>>>
>>> Looks like weight for both pools is equal and all map slots are used.
>>> Therefore I don't believe anyone has priority for the next slots. Try
>>> setting research weight to 2. This should allow research to take slots as
>>> tech released them.
>>>
>>> Sent from my iPhone
>>>
>>> On Jan 16, 2013, at 8:26 AM, Dhanasekaran Anbalagan <[EMAIL PROTECTED]>