Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Regression in trunk? (RE: Insert overwrite error using hive trunk)


Copy link to this message
-
RE: Regression in trunk? (RE: Insert overwrite error using hive trunk)
Here are the settings:

bin/hive -e "set;" | grep hive.merge

10/09/27 11:15:36 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively

Hive history file=/tmp/pradeepk/hive_job_log_pradeepk_201009271115_1683572284.txt

hive.merge.mapfiles=true

hive.merge.mapredfiles=false

hive.merge.size.per.task=256000000

hive.merge.smallfiles.avgsize=16000000

hive.mergejob.maponly=true

(BTW these seem to be the defaults since I am not setting anything specifically for merging files)

I tried your suggestion of setting hive.mergejob.maponly to false, but still see the same error (no tasks are launched and the job fails - this is the same with or without the change below)

[pradeepk@chargesize:/tmp/hive-svn/trunk/build/dist]bin/hive -e "set hive.mergejob.maponly=false; insert overwrite table numbers_text_part partition(part='p1') select id, num from numbers_text;"

On the console output I also see:

...

2010-09-27 11:16:57,827 Stage-1 map = 100%,  reduce = 0%

2010-09-27 11:17:00,859 Stage-1 map = 100%,  reduce = 100%

Ended Job = job_201009251752_1335

Ended Job = 1862840305, job is filtered out (removed at runtime).

Launching Job 2 out of 2

Any pointers much appreciated!

Thanks,

Pradeep

-----Original Message-----
From: Ning Zhang [mailto:[EMAIL PROTECTED]]
Sent: Monday, September 27, 2010 10:53 AM
To: <[EMAIL PROTECTED]>
Subject: Re: Regression in trunk? (RE: Insert overwrite error using hive trunk)

This clearly indicate the merge still happens due to the conditional task. Can you double check if the parameter is set (hive.merge.mapfiles).

Also if you can also revert it back to use the old map-reduce merging (rather than using CombineHiveInputFormat for map-only merging) by setting hive.mergejob.maponly=false.

I'm also curious why CombineHiveInputFormat failed in environment, can you also check your task log and see what errors are there (without changing all the above parameters)?

On Sep 27, 2010, at 10:38 AM, Pradeep Kamath wrote:

> Here is the output of explain:

>

> STAGE DEPENDENCIES:

> Stage-1 is a root stage

> Stage-4 depends on stages: Stage-1 , consists of Stage-3, Stage-2

> Stage-3

> Stage-0 depends on stages: Stage-3, Stage-2

> Stage-2

>

> STAGE PLANS:

> Stage: Stage-1

>   Map Reduce

>     Alias -> Map Operator Tree:

>       numbers_text

>         TableScan

>           alias: numbers_text

>           Select Operator

>             expressions:

>                   expr: id

>                   type: int

>                   expr: num

>                   type: int

>             outputColumnNames: _col0, _col1

>             File Output Operator

>               compressed: false

>               GlobalTableId: 1

>               table:

>                   input format: org.apache.hadoop.mapred.TextInputFormat

>                   output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

>                   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

>                   name: numbers_text_part

>

> Stage: Stage-4

>   Conditional Operator

>

> Stage: Stage-3

>   Move Operator

>     files:

>         hdfs directory: true

>         destination: hdfs://wilbur21.labs.corp.sp1.yahoo.com/tmp/hive-pradeepk/hive_2010-09-27_10-37-06_724_1678373180997754320/-ext-10000

>

> Stage: Stage-0

>   Move Operator

>     tables:

>         partition:

>           part p1

>         replace: true

>         table:

>             input format: org.apache.hadoop.mapred.TextInputFormat

>             output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

>             serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

>             name: numbers_text_part

>

> Stage: Stage-2

>   Map Reduce

>     Alias -> Map Operator Tree: