Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - delay before query starts processing


+
Marc Limotte 2013-01-30, 23:16
Copy link to this message
-
Re: delay before query starts processing
Ariel Marcus 2013-01-30, 23:22
>From the archives:
http://mail-archives.apache.org/mod_mbox/hive-user/201110.mbox/%3CCAC9SPjuQtxOK1KtEmReD6OanNTgNM_uLkGQD+[EMAIL PROTECTED]%3E

TL;DR set hive.optimize.s3.query=true;

---------------------------------
Ariel Marcus, Consultant
www.openbi.com | [EMAIL PROTECTED]
150 N Michigan Avenue, Suite 2800, Chicago, IL 60601
Cell: 314-827-4356
On Wed, Jan 30, 2013 at 6:16 PM, Marc Limotte <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I'm running in Amazon on an EMR cluster with hive 0.8.1.  We have a lot of
> other Hadoop jobs, but only started experimenting with Hive recently.
>
> I've been seeing a long pause after submitting a hive query and the
> actually start of the hadoop job... 10 minutes or more in some cases.  I'm
> wondering what's happening during this time.  Either a high level answer,
> or maybe there is some logging I can turn on?
>
> Here's some more detail.  I submit the query on the master using the hive
> cli, and start to see some output right away...
>
> Total MapReduce jobs = 2
> Launching Job 1 out of 2
> Number of reduce tasks not specified. Estimated from input data size: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>
>
>
> *[then a long delay here: 10 minutes or more... no activity in the hadoop
> job tracker ui] *
>
>
> … and then it continues normally ...
> Starting Job = job_201301160029_0082, Tracking URL > http://ip-xxxxxxxx.ec2.internal:9100/jobdetails.jsp?jobid=job_201301160029_0082
> Kill Command = /home/hadoop/bin/hadoop job
>  -Dmapred.job.tracker=xxxxxx:9001 -kill job_201301160029_0082
> Hadoop job information for Stage-1: number of mappers: 2; number of
> reducers: 1
> 2013-01-30 20:45:30,526 Stage-1 map = 0%,  reduce = 0%
> …
>
> This query is processing in the neighborhood of 500GB of data from S3.  A
> couple of possibilities I thought of… perhaps someone can confirm or deny:
> a) Is the data copied from S3 to HDFS during this time?
> b) I have a fairly large set of libs in HIVE_AUX_JAR_PATH (around ~175
> MB)-- does it have to copy these around to the tasks at this time?
>
> Any insights appreciated.
>
> Marc
>
>
>
>
> ------------------------------
>
> This transmission is confidential and intended solely for the use of the
> recipient named above. It may contain confidential, proprietary, or legally
> privileged information. If you are not the intended recipient, you are
> hereby notified that any unauthorized review, use, disclosure or
> distribution is strictly prohibited. If you have received this transmission
> in error, please contact the sender by reply e-mail and delete the original
> transmission and all copies from your system.
>

--

------------------------------

This transmission is confidential and intended solely for the use of the
recipient named above. It may contain confidential, proprietary, or legally
privileged information. If you are not the intended recipient, you are
hereby notified that any unauthorized review, use, disclosure or
distribution is strictly prohibited. If you have received this transmission
in error, please contact the sender by reply e-mail and delete the original
transmission and all copies from your system.
+
Abdelrahman Shettia 2013-01-30, 23:23
+
Marc Limotte 2013-02-01, 19:15