Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> delay before query starts processing


Copy link to this message
-
Re: delay before query starts processing
Hi Marc,

You can try running the hive client with debug mode on and see what is
trying to do on the JT level.
hive -hiveconf hive.root.logger=ALL,console -e " DDL;"
hive -hiveconf hive.root.logger=ALL,console -f ddl.sql ;

Hope this helps .

Thanks
-Abdelrahman
On Wed, Jan 30, 2013 at 3:16 PM, Marc Limotte <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I'm running in Amazon on an EMR cluster with hive 0.8.1.  We have a lot of
> other Hadoop jobs, but only started experimenting with Hive recently.
>
> I've been seeing a long pause after submitting a hive query and the
> actually start of the hadoop job... 10 minutes or more in some cases.  I'm
> wondering what's happening during this time.  Either a high level answer,
> or maybe there is some logging I can turn on?
>
> Here's some more detail.  I submit the query on the master using the hive
> cli, and start to see some output right away...
>
> Total MapReduce jobs = 2
> Launching Job 1 out of 2
> Number of reduce tasks not specified. Estimated from input data size: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>
>
>
> *[then a long delay here: 10 minutes or more... no activity in the hadoop
> job tracker ui] *
>
>
> … and then it continues normally ...
> Starting Job = job_201301160029_0082, Tracking URL > http://ip-xxxxxxxx.ec2.internal:9100/jobdetails.jsp?jobid=job_201301160029_0082
> Kill Command = /home/hadoop/bin/hadoop job
>  -Dmapred.job.tracker=xxxxxx:9001 -kill job_201301160029_0082
> Hadoop job information for Stage-1: number of mappers: 2; number of
> reducers: 1
> 2013-01-30 20:45:30,526 Stage-1 map = 0%,  reduce = 0%
> …
>
> This query is processing in the neighborhood of 500GB of data from S3.  A
> couple of possibilities I thought of… perhaps someone can confirm or deny:
> a) Is the data copied from S3 to HDFS during this time?
> b) I have a fairly large set of libs in HIVE_AUX_JAR_PATH (around ~175
> MB)-- does it have to copy these around to the tasks at this time?
>
> Any insights appreciated.
>
> Marc
>
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB