Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Fair scheduler.


+
Patai Sangbutsarakum 2012-10-14, 00:33
+
Harsh J 2012-10-14, 03:30
+
Patai Sangbutsarakum 2012-10-15, 21:27
+
Harsh J 2012-10-15, 22:18
Copy link to this message
-
Re: Fair scheduler.
Hi Harsh,
Thanks for breaking it down clearly. I would say i am successful 98%
from the instruction.
The 2% is about hadoop.tmp.dir

let's say i have 2 users
userA is a user that start hdfs and mapred
userB is a regular user

if i use default value of  hadoop.tmp.dir
/tmp/hadoop-${user.name}
I can submit job as usersA but not by usersB
ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
:userA:supergroup:drwxr-xr-x

i googled around; someone recommended to change hadoop.tmp.dir to /tmp/hadoop.
This way it is almost a yay way; the thing is

if I submit as userA it will create /tmp/hadoop in local machine which
ownership will be userA.userA,
and once I tried to submit job from the same machine as userB I will
get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
Permission denied"
(as because /tmp/hadoop is own by userA.userA). vise versa if I delete
/tmp/hadoop and let the directory be created by userB, userA will not
be able to submit job.

Which is the right approach i should work with?
Please suggest

Patai
On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> Hi Patai,
>
> Reply inline.
>
> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
> <[EMAIL PROTECTED]> wrote:
>> Thanks for input,
>>
>> I am reading the document; i forget to mention that i am on cdh3u4.
>
> That version should have the support for all of this.
>
>>> If you point your poolname property to mapred.job.queue.name, then you
>>> can leverage the Per-Queue ACLs
>>
>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>> configure 3 queues of capacity scheduler. in order to have each pool
>> can leverage Per-Queue ACL of each queue.?
>
> Queues are not hard-tied into CapacityScheduler. You can have generic
> queues in MR. And FairScheduler can bind its Pool concept into the
> Queue configuration.
>
> All you need to do is the following:
>
> 1. Map FairScheduler pool name to reuse queue names itself:
>
> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>
> 2. Define your required queues:
>
> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
> default, foo and bar.
>
> 3. Define Submit ACLs for each Queue:
>
> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
> (usernames groupnames)
>
> mapred.queue.foo.acl-submit-job set to "spam eggs"
>
> Likewise for remaining queues, as you need it…
>
> 4. Enable ACLs and restart JT.
>
> mapred.acls.enabled set to "true"
>
> 5. Users then use the right API to set queue names before submitting
> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf.html#setQueueName(java.lang.String)
>
> 6. Done.
>
> Let us know if this works!
>
> --
> Harsh J
+
Arpit Gupta 2012-10-16, 23:12
+
Patai Sangbutsarakum 2012-10-16, 23:52
+
Harsh J 2012-10-17, 07:00
+
Goldstone, Robin J. 2012-10-17, 15:09
+
Harsh J 2012-10-17, 15:43
+
Patai Sangbutsarakum 2012-10-17, 17:40
+
Harsh J 2012-10-17, 17:53
+
Luke Lu 2012-10-18, 10:11
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB