Nitin Pawar 2013-04-25, 10:04
Sagar Mehta 2013-04-26, 17:27
-Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue
Sandy Ryza 2013-04-26, 18:38
I'm glad to hear that it would help. Unfortunately, we are no longer
adding features to CDH3, so you would have to upgrade to CDH4 or backport
it yourself to use it.
On Fri, Apr 26, 2013 at 10:27 AM, Sagar Mehta <[EMAIL PROTECTED]> wrote:
> Hi Sandy,
> Thanks for your prompt reply!!
> The jira that you pointed out would make it easy for us to do the
> automatic mapping and getting close towards enforcing a policy
> automatically. Any idea when it would be incorporated into cdh/hadoop
> releases and if it could be back-ported for cdh3u2 which we have currently
> running in production?
> Currently we are getting around this using the -Dmapred.job.queue.name="X"
> and the subsequent mapping of map-red job queue to Fair-share scheduler
> pool. We are using ACLs [more of a white-list] by
> configuring mapred-queue-acls.xml to ensure people can only submit to the
> right queue.
> *Two limitations of this round-about approach are*
> 1. It is manual
> 2. It exposes the policy where user A is asked to submit jobs to queue
> X and user B is asked to submit jobs to queue Y [with different scheduler
> properties]. We want this to be completely transparent to the user of our
> The jira above would be a great first step towards such automatic mapping!!
> On Wed, Apr 24, 2013 at 11:41 PM, Sandy Ryza <[EMAIL PROTECTED]>wrote:
>> Hi Sagar,
>> This capability currently does not exist in the fair scheduler (or other
>> schedulers, as far as I know), but a JIRA has been filed recently that
>> addresses a similar need. Would
>> https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what
>> you're trying to do? If not, would you mind filing a new JIRA for the
>> functionality you'd want?
>> On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta <[EMAIL PROTECTED]>wrote:
>>> Hi Guys,
>>> We have a general purpose Hive cluster [about 200 nodes] which is used
>>> for various jobs like
>>> - Production
>>> - Experimental/Research
>>> - Adhoc queries
>>> We are using the fair-share scheduler to schedule them and for this we
>>> have corresponding 3 pools in the scheduler.
>>> *Here is what we want.*
>>> *A hive query submitted by a user with user-name A should go to one of
>>> the pools above based on a pre-defined mapping. We are wondering where/how
>>> to specify this mapping?*
>>> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
>>> particular job run.*
>>> This puts the job on the map-reduce queue named "X" and the following
>>> configuration in the fair-share scheduler
>>> maps this to a pool named "X" in the fair-share scheduler.
>>> However we [while wearing our Hadoop developer/admin hat] don't want the
>>> user/analyst to specify that so as to enforce some cluster-use policy.
>>> Based on his/her username we want to automatically select which hadoop
>>> queue and subsequently which fair-share scheduler pool, his/her job should
>>> go to. I'm pretty sure this is a common use-case and wondering how to do
>>> this in Hadoop.
>>> Any help/insights/pointers would be greatly appreciated.
>>> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.