Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> delay before query starts processing

Copy link to this message
Re: delay before query starts processing
Hi Marc,

You can try running the hive client with debug mode on and see what is
trying to do on the JT level.
hive -hiveconf hive.root.logger=ALL,console -e " DDL;"
hive -hiveconf hive.root.logger=ALL,console -f ddl.sql ;

Hope this helps .

On Wed, Jan 30, 2013 at 3:16 PM, Marc Limotte <[EMAIL PROTECTED]> wrote:

> Hi,
> I'm running in Amazon on an EMR cluster with hive 0.8.1.  We have a lot of
> other Hadoop jobs, but only started experimenting with Hive recently.
> I've been seeing a long pause after submitting a hive query and the
> actually start of the hadoop job... 10 minutes or more in some cases.  I'm
> wondering what's happening during this time.  Either a high level answer,
> or maybe there is some logging I can turn on?
> Here's some more detail.  I submit the query on the master using the hive
> cli, and start to see some output right away...
> Total MapReduce jobs = 2
> Launching Job 1 out of 2
> Number of reduce tasks not specified. Estimated from input data size: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>
> *[then a long delay here: 10 minutes or more... no activity in the hadoop
> job tracker ui] *
> … and then it continues normally ...
> Starting Job = job_201301160029_0082, Tracking URL > http://ip-xxxxxxxx.ec2.internal:9100/jobdetails.jsp?jobid=job_201301160029_0082
> Kill Command = /home/hadoop/bin/hadoop job
>  -Dmapred.job.tracker=xxxxxx:9001 -kill job_201301160029_0082
> Hadoop job information for Stage-1: number of mappers: 2; number of
> reducers: 1
> 2013-01-30 20:45:30,526 Stage-1 map = 0%,  reduce = 0%
> …
> This query is processing in the neighborhood of 500GB of data from S3.  A
> couple of possibilities I thought of… perhaps someone can confirm or deny:
> a) Is the data copied from S3 to HDFS during this time?
> b) I have a fairly large set of libs in HIVE_AUX_JAR_PATH (around ~175
> MB)-- does it have to copy these around to the tasks at this time?
> Any insights appreciated.
> Marc