Chunky Gupta 2012-10-22, 11:57
Bejoy KS 2012-10-22, 12:18
Bertrand Dechoux 2012-10-22, 12:22
Chunky Gupta 2012-10-22, 12:52
-Re: How to run multiple Hive queries in parallel
Bejoy KS 2012-10-22, 15:10
From the jobtracker web UI you can get the total number of map and reduce slots. Also from the wen UI itself you can get the num of running map/reduce tasks. Second value subtracted from first would give you the available slots.
Fair scheduler is a property of map reduce and not of hive. It is primarily used to control the number of slots used by each user/pool in a cluster. You can read more @
Sent from handheld, please excuse typos.
From: Chunky Gupta <[EMAIL PROTECTED]>
Date: Mon, 22 Oct 2012 18:22:03
To: <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Subject: Re: How to run multiple Hive queries in parallel
Hi Bejoy and Bertrand
Thanks for quick reply.
I think tasks slots are not available in my cluster because I have only 4
Actually I am beginner to HIVE. So, if you can let me know how I can check
if time slots are available or not.
I have different users credentials to log in into my name node machine, but
I don't have much idea about fair scheduler.
In case time slots are not available and are exhausted , then if you can
please point me to some publicly available fair scheduler which I can
integrate with HIVE to solve my problem.
On Mon, Oct 22, 2012 at 5:52 PM, Bertrand Dechoux <[EMAIL PROTECTED]>wrote:
> Bejoy is right. I just want to say explicitly that the scheduler
> configuration is something which is orthogonal to the use of Hive. (ie same
> problem with Pig or standard MapReduce jobs).
> PS : There is also the capacity scheduler.
> On Mon, Oct 22, 2012 at 2:18 PM, Bejoy KS <[EMAIL PROTECTED]> wrote:
>> Is your hive queries in waiting mode even though there are task slots
>> available on your cluster?
>> If task slots are getting exhausted and you need parallelism here, then
>> you may need to look at some approaches of using fair scheduler and
>> different user accounts for each user so that each user gets his fair share
>> of task slots.
>> Bejoy KS
>> Sent from handheld, please excuse typos.
>> *From: * Chunky Gupta <[EMAIL PROTECTED]>
>> *Date: *Mon, 22 Oct 2012 17:27:45 +0530
>> *To: *<[EMAIL PROTECTED]>
>> *ReplyTo: * [EMAIL PROTECTED]
>> *Subject: *How to run multiple Hive queries in parallel
>> I have one name node machine and under which there are 4 slaves machines
>> to run the job.
>> The way users run queries is
>> - They ssh into the name node machine
>> - They initiate hive and submit their queries
>> Currently multiple users log in with the same credentials and submit
>> Whenever 2 or more users try to run queries at a same time from different
>> hive console , it runs only one query at a time and when that query is
>> finished then only next query starts executing and so on.
>> In this scenario if there is a large query which is submitted earlier
>> then all the other queries have to wait for that query to complete.
>> I want to run multiple query at the same time. Is there any way or any
>> configuration parameter to do the same ?
>> PS: The data is in S3 and running HIVE on AWS EMR infrastructure in
>> interactive mode.
>> Thank You,
> Bertrand Dechoux