-Re: How to run multiple Hive queries in parallel
Bertrand Dechoux 2012-10-22, 12:22
Bejoy is right. I just want to say explicitly that the scheduler
configuration is something which is orthogonal to the use of Hive. (ie same
problem with Pig or standard MapReduce jobs).
PS : There is also the capacity scheduler.
On Mon, Oct 22, 2012 at 2:18 PM, Bejoy KS <[EMAIL PROTECTED]> wrote:
> Is your hive queries in waiting mode even though there are task slots
> available on your cluster?
> If task slots are getting exhausted and you need parallelism here, then
> you may need to look at some approaches of using fair scheduler and
> different user accounts for each user so that each user gets his fair share
> of task slots.
> Bejoy KS
> Sent from handheld, please excuse typos.
> *From: * Chunky Gupta <[EMAIL PROTECTED]>
> *Date: *Mon, 22 Oct 2012 17:27:45 +0530
> *To: *<[EMAIL PROTECTED]>
> *ReplyTo: * [EMAIL PROTECTED]
> *Subject: *How to run multiple Hive queries in parallel
> I have one name node machine and under which there are 4 slaves machines
> to run the job.
> The way users run queries is
> - They ssh into the name node machine
> - They initiate hive and submit their queries
> Currently multiple users log in with the same credentials and submit
> Whenever 2 or more users try to run queries at a same time from different
> hive console , it runs only one query at a time and when that query is
> finished then only next query starts executing and so on.
> In this scenario if there is a large query which is submitted earlier then
> all the other queries have to wait for that query to complete.
> I want to run multiple query at the same time. Is there any way or any
> configuration parameter to do the same ?
> PS: The data is in S3 and running HIVE on AWS EMR infrastructure in
> interactive mode.
> Thank You,