Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue


Copy link to this message
-
Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue
Nitin Pawar 2013-04-25, 10:04
the current capacity scheduler guarantees that which users can submit jobs
to which queue and other related features.
More of which you can read at
http://hadoop.apache.org/docs/stable/capacity_scheduler.html

but on the hive side, unless you set mapred.job.queue.name on the hive cli,
they will be submitted to default job queue.

So basically what you would like to do is create user, associate it with a
queue on scheduler and ask the user to modify its queue on local hiverc
file.

I am not sure if this can be part of hive's metastore. Because one user can
be allowed to submit the job to multiple queues and then best way to handle
it is via setting the property each time you open the session or via hiverc
file
On Thu, Apr 25, 2013 at 12:11 PM, Sandy Ryza <[EMAIL PROTECTED]>wrote:

> Hi Sagar,
>
> This capability currently does not exist in the fair scheduler (or other
> schedulers, as far as I know), but a JIRA has been filed recently that
> addresses a similar need.   Would
> https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what you're
> trying to do?  If not, would you mind filing a new JIRA for the
> functionality you'd want?
>
> -Sandy
>
>
> On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta <[EMAIL PROTECTED]> wrote:
>
>> Hi Guys,
>>
>> We have a general purpose Hive cluster [about 200 nodes] which is used
>> for various jobs like
>>
>>    - Production
>>    - Experimental/Research
>>    - Adhoc queries
>>
>> We are using the fair-share scheduler to schedule them and for this we
>> have corresponding 3 pools in the scheduler.
>>
>> *Here is what we want.*
>>
>> *A hive query submitted by a user with user-name A should go to one of
>> the pools above based on a pre-defined mapping. We are wondering where/how
>> to specify this mapping?*
>>
>> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
>> particular job run.*
>>
>> This puts the job on the map-reduce queue named "X" and the following
>> configuration in the fair-share scheduler
>>
>>   <property>
>>     <name>mapred.fairscheduler.poolnameproperty</name>
>>     <value>mapred.job.queue.name</value>
>>   </property>
>>
>> maps this to a pool named "X" in the fair-share scheduler.
>>
>> However we [while wearing our Hadoop developer/admin hat] don't want the
>> user/analyst to specify that so as to enforce some cluster-use policy.
>>
>> Based on his/her username we want to automatically select which hadoop
>> queue and subsequently which fair-share scheduler pool, his/her job should
>> go to. I'm pretty sure this is a common use-case and wondering how to do
>> this in Hadoop.
>>
>> Any help/insights/pointers would be greatly appreciated.
>>
>> Sagar
>> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>>
>>
>>
>>
>
--
Nitin Pawar