Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - How to run multiple Hive queries in parallel


+
Chunky Gupta 2012-10-22, 11:57
+
Bejoy KS 2012-10-22, 12:18
+
Bertrand Dechoux 2012-10-22, 12:22
+
Chunky Gupta 2012-10-22, 12:52
Copy link to this message
-
Re: How to run multiple Hive queries in parallel
Bejoy KS 2012-10-22, 15:10
Hi

From the jobtracker web UI you can get the total number of map and reduce slots. Also from the wen UI itself you can get the num of running map/reduce tasks. Second value subtracted from first would give you the available slots.

Fair scheduler is a property of map reduce and not of hive. It is primarily used to control the number of slots used by each user/pool in a cluster. You can read more @

http://hadoop.apache.org/docs/mapreduce/r0.20.2/fair_scheduler.html
Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Chunky Gupta <[EMAIL PROTECTED]>
Date: Mon, 22 Oct 2012 18:22:03
To: <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Subject: Re: How to run multiple Hive queries in parallel

Hi Bejoy and Bertrand

Thanks for quick reply.

I think tasks slots are not available in my cluster because I have only 4
slave machines.
Actually I am beginner to HIVE.  So, if you can let me know how I can check
if time slots are available or not.

I have different users credentials to log in into my name node machine, but
I don't have much idea about fair scheduler.

In case time slots are not available and are exhausted , then if you can
please point me to some publicly available fair scheduler which I can
integrate with HIVE to solve my problem.

Thank You,
Chunky.

On Mon, Oct 22, 2012 at 5:52 PM, Bertrand Dechoux <[EMAIL PROTECTED]>wrote:

> Bejoy is right. I just want to say explicitly that the scheduler
> configuration is something which is orthogonal to the use of Hive. (ie same
> problem with Pig or standard MapReduce jobs).
>
> Regards
>
> Bertrand
>
> PS : There is also the capacity scheduler.
>
>
> On Mon, Oct 22, 2012 at 2:18 PM, Bejoy KS <[EMAIL PROTECTED]> wrote:
>
>> **
>> Hi
>>
>> Is your hive queries in waiting mode even though there are task slots
>> available on your cluster?
>>
>> If task slots are getting exhausted and you need parallelism here, then
>> you may need to look at some approaches of using fair scheduler and
>> different user accounts for each user so that each user gets his fair share
>> of task slots.
>>
>>
>> Regards
>> Bejoy KS
>>
>> Sent from handheld, please excuse typos.
>> ------------------------------
>> *From: * Chunky Gupta <[EMAIL PROTECTED]>
>> *Date: *Mon, 22 Oct 2012 17:27:45 +0530
>> *To: *<[EMAIL PROTECTED]>
>> *ReplyTo: * [EMAIL PROTECTED]
>> *Subject: *How to run multiple Hive queries in parallel
>>
>> Hi,
>>
>> I have one name node machine and under which there are 4 slaves machines
>> to run the job.
>>
>> The way users run queries is
>> - They ssh into the name node machine
>> - They initiate hive and submit their queries
>>
>> Currently multiple users log in with the same credentials and submit
>> queries
>>
>> Whenever 2 or more users try to run queries at a same time from different
>> hive console , it runs only one query at a time and when that query is
>> finished then only next query starts executing and so on.
>>
>> In this scenario if there is a large query which is submitted earlier
>> then all the other queries have to wait for that query to complete.
>>
>> I want to run multiple query at the same time. Is there any way or any
>> configuration parameter to do the same ?
>>
>> PS: The data is in S3 and running HIVE on AWS EMR infrastructure in
>> interactive mode.
>>
>> Thank You,
>> Chunky.
>>
>>
>
>
> --
> Bertrand Dechoux
>