Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Executing Hive Queries in Parallel


Copy link to this message
-
Re: Executing Hive Queries in Parallel
Hey

Instead of going into HIVE CLI
I would propose 2 ways

NOHUP
nohup hive -f path/to/query/file/hive1.hql >> ./hive1.hql_`date +%Y-%m-%d-%H–%M–%S`.log 2>&1
nohup hive -f path/to/query/file/hive2.hql >> ./hive2.hql_`date +%Y-%m-%d-%H–%M–%S`.log 2>&1
nohup hive -f path/to/query/file/hive3.hql >> ./hive3.hql_`date +%Y-%m-%d-%H–%M–%S`.log 2>&1
nohup hive -f path/to/query/file/hive4.hql >> ./hive4.hql_`date +%Y-%m-%d-%H–%M–%S`.log 2>&1
nohup hive -f path/to/query/file/hive5.hql >> ./hive5.hql_`date +%Y-%m-%d-%H–%M–%S`.log 2>&1

Each statement above will launch MR jobs on your cluster and depending on the cluster configs the jobs will run parallelly
Scheduling jobs on the MR cluster is independent of Hive

SCREEN sessions

  *   Create a Screen session
     *   screen  –S  hive_query1
     *   U r inside the screen session hive_query1
        *   hive -f path/to/query/file/hive1.hql
     *   Ctrl A D
        *   U detach from a screen session
  *   Repeat for each hive query u want to run
     *   I.e. Say 5 screen sessions, each running a have query
  *   To display screen session active
     *   screen -x
  *   To attach to a screen session
     *   screen  -x hive_query1

Thanks
Warm Regards

Sanjay

From: saurabh <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Monday, April 21, 2014 at 1:53 PM
To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Subject: Executing Hive Queries in Parallel
Hi,
I need some inputs to execute hive queries in parallel. I tried doing this using CLI (by opening multiple ssh connection) and executed 4 HQL's; it was observed that the queries are getting executed sequentially. All the FOUR queries got submitted however while the first one was in execution mode the other were in pending state. I was performing this activity on the EMR running on Batch mode hence didn't able to dig into the logs.

The hive CLI uses native hive connection which by default uses the FIFO scheduler.  This might be one of the reason for the queries getting executed in sequence.

I also observed that when multiple queries are executed using multiple HUE sessions, it provides the parallel execution functionality. Can you please suggest how the functionality of HUE can be replicated using CLI?

I am aware of beeswax client however i am not sure how this can be used during EMR- batch mode processing.

Thanks in advance for going through this. Kindly let me know your thoughts on the same.