Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue


Copy link to this message
-
Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue
Hi Guys,

We have a general purpose Hive cluster [about 200 nodes] which is used for
various jobs like

   - Production
   - Experimental/Research
   - Adhoc queries

We are using the fair-share scheduler to schedule them and for this we have
corresponding 3 pools in the scheduler.

*Here is what we want.*

*A hive query submitted by a user with user-name A should go to one of the
pools above based on a pre-defined mapping. We are wondering where/how to
specify this mapping?*

*We can do this manually by adding -Dmapred.job.queue.name="X" on a
particular job run.*

This puts the job on the map-reduce queue named "X" and the following
configuration in the fair-share scheduler

  <property>
    <name>mapred.fairscheduler.poolnameproperty</name>
    <value>mapred.job.queue.name</value>
  </property>

maps this to a pool named "X" in the fair-share scheduler.

However we [while wearing our Hadoop developer/admin hat] don't want the
user/analyst to specify that so as to enforce some cluster-use policy.

Based on his/her username we want to automatically select which hadoop
queue and subsequently which fair-share scheduler pool, his/her job should
go to. I'm pretty sure this is a common use-case and wondering how to do
this in Hadoop.

Any help/insights/pointers would be greatly appreciated.

Sagar
PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.