Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue


Copy link to this message
-
Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue
Hi Guys,

We have a general purpose Hive cluster [about 200 nodes] which is used for
various jobs like

   - Production
   - Experimental/Research
   - Adhoc queries

We are using the fair-share scheduler to schedule them and for this we have
corresponding 3 pools in the scheduler.

*Here is what we want.*

*A hive query submitted by a user with user-name A should go to one of the
pools above based on a pre-defined mapping. We are wondering where/how to
specify this mapping?*

*We can do this manually by adding -Dmapred.job.queue.name="X" on a
particular job run.*

This puts the job on the map-reduce queue named "X" and the following
configuration in the fair-share scheduler

  <property>
    <name>mapred.fairscheduler.poolnameproperty</name>
    <value>mapred.job.queue.name</value>
  </property>

maps this to a pool named "X" in the fair-share scheduler.

However we [while wearing our Hadoop developer/admin hat] don't want the
user/analyst to specify that so as to enforce some cluster-use policy.

Based on his/her username we want to automatically select which hadoop
queue and subsequently which fair-share scheduler pool, his/her job should
go to. I'm pretty sure this is a common use-case and wondering how to do
this in Hadoop.

Any help/insights/pointers would be greatly appreciated.

Sagar
PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB