-Re: Executing Hive Queries in Parallel
"Subramanian, Sanjay 2014-04-21, 21:44
Instead of going into HIVE CLI
I would propose 2 ways
nohup hive -f path/to/query/file/hive1.hql >> ./hive1.hql_`date +%Y-%m-%d-%H–%M–%S`.log 2>&1
nohup hive -f path/to/query/file/hive2.hql >> ./hive2.hql_`date +%Y-%m-%d-%H–%M–%S`.log 2>&1
nohup hive -f path/to/query/file/hive3.hql >> ./hive3.hql_`date +%Y-%m-%d-%H–%M–%S`.log 2>&1
nohup hive -f path/to/query/file/hive4.hql >> ./hive4.hql_`date +%Y-%m-%d-%H–%M–%S`.log 2>&1
nohup hive -f path/to/query/file/hive5.hql >> ./hive5.hql_`date +%Y-%m-%d-%H–%M–%S`.log 2>&1
Each statement above will launch MR jobs on your cluster and depending on the cluster configs the jobs will run parallelly
Scheduling jobs on the MR cluster is independent of Hive
* Create a Screen session
* screen –S hive_query1
* U r inside the screen session hive_query1
* hive -f path/to/query/file/hive1.hql
* Ctrl A D
* U detach from a screen session
* Repeat for each hive query u want to run
* I.e. Say 5 screen sessions, each running a have query
* To display screen session active
* screen -x
* To attach to a screen session
* screen -x hive_query1
From: saurabh <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Monday, April 21, 2014 at 1:53 PM
To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Subject: Executing Hive Queries in Parallel
I need some inputs to execute hive queries in parallel. I tried doing this using CLI (by opening multiple ssh connection) and executed 4 HQL's; it was observed that the queries are getting executed sequentially. All the FOUR queries got submitted however while the first one was in execution mode the other were in pending state. I was performing this activity on the EMR running on Batch mode hence didn't able to dig into the logs.
The hive CLI uses native hive connection which by default uses the FIFO scheduler. This might be one of the reason for the queries getting executed in sequence.
I also observed that when multiple queries are executed using multiple HUE sessions, it provides the parallel execution functionality. Can you please suggest how the functionality of HUE can be replicated using CLI?
I am aware of beeswax client however i am not sure how this can be used during EMR- batch mode processing.
Thanks in advance for going through this. Kindly let me know your thoughts on the same.