Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - How to run multiple Hive queries in parallel


Copy link to this message
-
Re: How to run multiple Hive queries in parallel
Chunky Gupta 2012-10-22, 12:52
Hi Bejoy and Bertrand

Thanks for quick reply.

I think tasks slots are not available in my cluster because I have only 4
slave machines.
Actually I am beginner to HIVE.  So, if you can let me know how I can check
if time slots are available or not.

I have different users credentials to log in into my name node machine, but
I don't have much idea about fair scheduler.

In case time slots are not available and are exhausted , then if you can
please point me to some publicly available fair scheduler which I can
integrate with HIVE to solve my problem.

Thank You,
Chunky.

On Mon, Oct 22, 2012 at 5:52 PM, Bertrand Dechoux <[EMAIL PROTECTED]>wrote:

> Bejoy is right. I just want to say explicitly that the scheduler
> configuration is something which is orthogonal to the use of Hive. (ie same
> problem with Pig or standard MapReduce jobs).
>
> Regards
>
> Bertrand
>
> PS : There is also the capacity scheduler.
>
>
> On Mon, Oct 22, 2012 at 2:18 PM, Bejoy KS <[EMAIL PROTECTED]> wrote:
>
>> **
>> Hi
>>
>> Is your hive queries in waiting mode even though there are task slots
>> available on your cluster?
>>
>> If task slots are getting exhausted and you need parallelism here, then
>> you may need to look at some approaches of using fair scheduler and
>> different user accounts for each user so that each user gets his fair share
>> of task slots.
>>
>>
>> Regards
>> Bejoy KS
>>
>> Sent from handheld, please excuse typos.
>> ------------------------------
>> *From: * Chunky Gupta <[EMAIL PROTECTED]>
>> *Date: *Mon, 22 Oct 2012 17:27:45 +0530
>> *To: *<[EMAIL PROTECTED]>
>> *ReplyTo: * [EMAIL PROTECTED]
>> *Subject: *How to run multiple Hive queries in parallel
>>
>> Hi,
>>
>> I have one name node machine and under which there are 4 slaves machines
>> to run the job.
>>
>> The way users run queries is
>> - They ssh into the name node machine
>> - They initiate hive and submit their queries
>>
>> Currently multiple users log in with the same credentials and submit
>> queries
>>
>> Whenever 2 or more users try to run queries at a same time from different
>> hive console , it runs only one query at a time and when that query is
>> finished then only next query starts executing and so on.
>>
>> In this scenario if there is a large query which is submitted earlier
>> then all the other queries have to wait for that query to complete.
>>
>> I want to run multiple query at the same time. Is there any way or any
>> configuration parameter to do the same ?
>>
>> PS: The data is in S3 and running HIVE on AWS EMR infrastructure in
>> interactive mode.
>>
>> Thank You,
>> Chunky.
>>
>>
>
>
> --
> Bertrand Dechoux
>