Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - Regression in trunk? (RE: Insert overwrite error using hive trunk)


+
Pradeep Kamath 2010-09-27, 16:10
+
Ning Zhang 2010-09-27, 16:25
+
Ashutosh Chauhan 2010-09-27, 16:25
+
Ning Zhang 2010-09-27, 16:31
+
yongqiang he 2010-09-27, 16:44
+
Pradeep Kamath 2010-09-27, 17:38
+
Ning Zhang 2010-09-27, 17:52
+
Pradeep Kamath 2010-09-27, 18:22
+
Ning Zhang 2010-09-27, 18:33
Copy link to this message
-
RE: Regression in trunk? (RE: Insert overwrite error using hive trunk)
Pradeep Kamath 2010-09-27, 19:33
Yes setting hive.merge.mapfiles=false caused the query to succeed. Unfortunately without this setting, there are no logs for tasks for the second job since they never get launced even. The failure is very quick after the second job is started and is even before any tasks launch. So I could not find any logs to get more messages. I am noticing this on trunk with the default set up - any settings I can set to get more information that can help?

Thanks,
Pradeep

________________________________
From: Ning Zhang [mailto:[EMAIL PROTECTED]]
Sent: Monday, September 27, 2010 11:34 AM
To: <[EMAIL PROTECTED]>
Subject: Re: Regression in trunk? (RE: Insert overwrite error using hive trunk)

This means it failed even with the previous map-reduce merge job. Without looking at the task log file, it's very hard to tell what happened.

A quick fix to do is to set hive.merge.mapfiles=false.
On Sep 27, 2010, at 11:22 AM, Pradeep Kamath wrote:

Here are the settings:

bin/hive -e "set;" | grep hive.merge

10/09/27 11:15:36 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively

Hive history file=/tmp/pradeepk/hive_job_log_pradeepk_201009271115_1683572284.txt

hive.merge.mapfiles=true

hive.merge.mapredfiles=false

hive.merge.size.per.task=256000000

hive.merge.smallfiles.avgsize=16000000

hive.mergejob.maponly=true

(BTW these seem to be the defaults since I am not setting anything specifically for merging files)

I tried your suggestion of setting hive.mergejob.maponly to false, but still see the same error (no tasks are launched and the job fails - this is the same with or without the change below)

[pradeepk@chargesize:/tmp/hive-svn/trunk/build/dist]bin/hive -e "set hive.mergejob.maponly=false; insert overwrite table numbers_text_part partition(part='p1') select id, num from numbers_text;"

On the console output I also see:

...

2010-09-27 11:16:57,827 Stage-1 map = 100%,  reduce = 0%

2010-09-27 11:17:00,859 Stage-1 map = 100%,  reduce = 100%

Ended Job = job_201009251752_1335

Ended Job = 1862840305, job is filtered out (removed at runtime).

Launching Job 2 out of 2

Any pointers much appreciated!

Thanks,

Pradeep

-----Original Message-----
From: Ning Zhang [mailto:[EMAIL PROTECTED]]
Sent: Monday, September 27, 2010 10:53 AM
To: <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Subject: Re: Regression in trunk? (RE: Insert overwrite error using hive trunk)

This clearly indicate the merge still happens due to the conditional task. Can you double check if the parameter is set (hive.merge.mapfiles).

Also if you can also revert it back to use the old map-reduce merging (rather than using CombineHiveInputFormat for map-only merging) by setting hive.mergejob.maponly=false.

I'm also curious why CombineHiveInputFormat failed in environment, can you also check your task log and see what errors are there (without changing all the above parameters)?

On Sep 27, 2010, at 10:38 AM, Pradeep Kamath wrote:

> Here is the output of explain:

>

> STAGE DEPENDENCIES:

> Stage-1 is a root stage

> Stage-4 depends on stages: Stage-1 , consists of Stage-3, Stage-2

> Stage-3

> Stage-0 depends on stages: Stage-3, Stage-2

> Stage-2

>

> STAGE PLANS:

> Stage: Stage-1

>   Map Reduce

>     Alias -> Map Operator Tree:

>       numbers_text

>         TableScan

>           alias: numbers_text

>           Select Operator

>             expressions:

>                   expr: id

>                   type: int

>                   expr: num

>                   type: int

>             outputColumnNames: _col0, _col1

>             File Output Operator

>               compressed: false

>               GlobalTableId: 1

>               table:

>                   input format: org.apache.hadoop.mapred.TextInputFormat






















+
Steven Wong 2010-09-27, 20:11
+
Ning Zhang 2010-09-27, 20:37
+
Pradeep Kamath 2010-09-28, 00:58
+
Amareshwari Sri Ramadasu 2010-09-28, 08:03
+
Pradeep Kamath 2010-09-28, 16:31
+
Pradeep Kamath 2010-09-28, 17:30
+
Ning Zhang 2010-09-28, 18:23
+
Pradeep Kamath 2010-09-28, 19:30
+
Ning Zhang 2010-09-28, 20:43