|
Pradeep Kamath
2010-09-27, 16:10
Ning Zhang
2010-09-27, 16:25
Ashutosh Chauhan
2010-09-27, 16:25
Ning Zhang
2010-09-27, 16:31
yongqiang he
2010-09-27, 16:44
Pradeep Kamath
2010-09-27, 17:38
Ning Zhang
2010-09-27, 17:52
Pradeep Kamath
2010-09-27, 18:22
Ning Zhang
2010-09-27, 18:33
Pradeep Kamath
2010-09-27, 19:33
Steven Wong
2010-09-27, 20:11
Ning Zhang
2010-09-27, 20:37
Pradeep Kamath
2010-09-28, 00:58
Amareshwari Sri Ramadasu
2010-09-28, 08:03
Pradeep Kamath
2010-09-28, 16:31
Pradeep Kamath
2010-09-28, 17:30
Ning Zhang
2010-09-28, 18:23
Pradeep Kamath
2010-09-28, 19:30
Ning Zhang
2010-09-28, 20:43
|
-
Regression in trunk? (RE: Insert overwrite error using hive trunk)Pradeep Kamath 2010-09-27, 16:10
Hi,
Any help in debugging the issue I am seeing below will be greatly appreciated. Unless I am doing something wrong, this seems to be a regression in trunk. Thanks, Pradeep ________________________________ From: Pradeep Kamath [mailto:[EMAIL PROTECTED]] Sent: Friday, September 24, 2010 1:41 PM To: [EMAIL PROTECTED] Subject: Insert overwrite error using hive trunk Hi, I am trying to insert overwrite into a partitioned table reading data from a non partitioned table and seeing a failure in the second map reduce job - wonder if I am doing something wrong - any pointers appreciated (I am using latest trunk code against hadoop 20 cluster). Details below[1]. Thanks, Pradeep [1] Details: bin/hive -e "describe numbers_text;" col_name data_type comment id int None num int None bin/hive -e "describe numbers_text_part;" col_name data_type comment id int None num int None # Partition Information col_name data_type comment part string None bin/hive -e "select * from numbers_text;" 1 10 2 20 bin/hive -e "insert overwrite table numbers_text_part partition(part='p1') select id, num from numbers_text;" Total MapReduce jobs = 2 Launching Job 1 out of 2 Number of reduce tasks is set to 0 since there's no reduce operator ... 2010-09-24 13:28:55,649 Stage-1 map = 0%, reduce = 0% 2010-09-24 13:28:58,687 Stage-1 map = 100%, reduce = 0% 2010-09-24 13:29:01,726 Stage-1 map = 100%, reduce = 100% Ended Job = job_201009241059_0281 Ended Job = -1897439470, job is filtered out (removed at runtime). Launching Job 2 out of 2 Number of reduce tasks is set to 0 since there's no reduce operator ... 2010-09-24 13:29:03,504 Stage-2 map = 100%, reduce = 100% Ended Job = job_201009241059_0282 with errors FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask tail /tmp/pradeepk/hive.log: 2010-09-24 13:29:01,888 WARN mapred.JobClient (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2010-09-24 13:29:01,903 WARN fs.FileSystem (FileSystem.java:fixName(153)) - "wilbur21.labs.corp.sp1.yahoo.com:8020" is a deprecated filesystem name. Use "hdfs://wilbur21.labs.corp.sp1.yahoo.com:8020/" instead. 2010-09-24 13:29:03,512 ERROR exec.MapRedTask (SessionState.java:printError(277)) - Ended Job = job_201009241059_0282 with errors 2010-09-24 13:29:03,537 ERROR ql.Driver (SessionState.java:printError(277)) - FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
-
Re: Regression in trunk? (RE: Insert overwrite error using hive trunk)Ning Zhang 2010-09-27, 16:25
I'm guessing this is due to the merge task (the 2nd MR job that merges small files together). You can try to 'set hive.merge.mapfiles=false;' before the query and see if it succeeded.
If it is due to merge job, can you attach the plan and check the mapper/reducer task log and see what errors/exceptions are there? On Sep 27, 2010, at 9:10 AM, Pradeep Kamath wrote: Hi, Any help in debugging the issue I am seeing below will be greatly appreciated. Unless I am doing something wrong, this seems to be a regression in trunk. Thanks, Pradeep ________________________________ From: Pradeep Kamath [mailto:[EMAIL PROTECTED]] Sent: Friday, September 24, 2010 1:41 PM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Subject: Insert overwrite error using hive trunk Hi, I am trying to insert overwrite into a partitioned table reading data from a non partitioned table and seeing a failure in the second map reduce job – wonder if I am doing something wrong – any pointers appreciated (I am using latest trunk code against hadoop 20 cluster). Details below[1]. Thanks, Pradeep [1] Details: bin/hive -e "describe numbers_text;" col_name data_type comment id int None num int None bin/hive -e "describe numbers_text_part;" col_name data_type comment id int None num int None # Partition Information col_name data_type comment part string None bin/hive -e "select * from numbers_text;" 1 10 2 20 bin/hive -e "insert overwrite table numbers_text_part partition(part='p1') select id, num from numbers_text;" Total MapReduce jobs = 2 Launching Job 1 out of 2 Number of reduce tasks is set to 0 since there's no reduce operator … 2010-09-24 13:28:55,649 Stage-1 map = 0%, reduce = 0% 2010-09-24 13:28:58,687 Stage-1 map = 100%, reduce = 0% 2010-09-24 13:29:01,726 Stage-1 map = 100%, reduce = 100% Ended Job = job_201009241059_0281 Ended Job = -1897439470, job is filtered out (removed at runtime). Launching Job 2 out of 2 Number of reduce tasks is set to 0 since there's no reduce operator … 2010-09-24 13:29:03,504 Stage-2 map = 100%, reduce = 100% Ended Job = job_201009241059_0282 with errors FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask tail /tmp/pradeepk/hive.log: 2010-09-24 13:29:01,888 WARN mapred.JobClient (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2010-09-24 13:29:01,903 WARN fs.FileSystem (FileSystem.java:fixName(153)) - "wilbur21.labs.corp.sp1.yahoo.com:8020" is a deprecated filesystem name. Use "hdfs://wilbur21.labs.corp.sp1.yahoo.com:8020/" instead. 2010-09-24 13:29:03,512 ERROR exec.MapRedTask (SessionState.java:printError(277)) - Ended Job = job_201009241059_0282 with errors 2010-09-24 13:29:03,537 ERROR ql.Driver (SessionState.java:printError(277)) - FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
-
Re: Regression in trunk? (RE: Insert overwrite error using hive trunk)Ashutosh Chauhan 2010-09-27, 16:25
I suspected the same. But, even after setting this property, second MR
job did get launched and then failed. Ashutosh On Mon, Sep 27, 2010 at 09:25, Ning Zhang <[EMAIL PROTECTED]> wrote: > I'm guessing this is due to the merge task (the 2nd MR job that merges small > files together). You can try to 'set hive.merge.mapfiles=false;' before the > query and see if it succeeded. > If it is due to merge job, can you attach the plan and check the > mapper/reducer task log and see what errors/exceptions are there? > > On Sep 27, 2010, at 9:10 AM, Pradeep Kamath wrote: > > Hi, > > Any help in debugging the issue I am seeing below will be greatly > appreciated. Unless I am doing something wrong, this seems to be a > regression in trunk. > > > > Thanks, > > Pradeep > > > > ________________________________ > > From: Pradeep Kamath [mailto:[EMAIL PROTECTED]] > Sent: Friday, September 24, 2010 1:41 PM > To: [EMAIL PROTECTED] > Subject: Insert overwrite error using hive trunk > > > > Hi, > > I am trying to insert overwrite into a partitioned table reading data > from a non partitioned table and seeing a failure in the second map reduce > job – wonder if I am doing something wrong – any pointers appreciated (I am > using latest trunk code against hadoop 20 cluster). Details below[1]. > > > > Thanks, > > Pradeep > > > > [1] > > Details: > > bin/hive -e "describe numbers_text;" > > col_name data_type comment > > id int None > > num int None > > > > bin/hive -e "describe numbers_text_part;" > > col_name data_type comment > > id int None > > num int None > > # Partition Information > > col_name data_type comment > > part string None > > > > bin/hive -e "select * from numbers_text;" > > 1 10 > > 2 20 > > > > bin/hive -e "insert overwrite table numbers_text_part partition(part='p1') > select id, num from numbers_text;" > > Total MapReduce jobs = 2 > > Launching Job 1 out of 2 > > Number of reduce tasks is set to 0 since there's no reduce operator > > … > > 2010-09-24 13:28:55,649 Stage-1 map = 0%, reduce = 0% > > 2010-09-24 13:28:58,687 Stage-1 map = 100%, reduce = 0% > > 2010-09-24 13:29:01,726 Stage-1 map = 100%, reduce = 100% > > Ended Job = job_201009241059_0281 > > Ended Job = -1897439470, job is filtered out (removed at runtime). > > Launching Job 2 out of 2 > > Number of reduce tasks is set to 0 since there's no reduce operator > > … > > 2010-09-24 13:29:03,504 Stage-2 map = 100%, reduce = 100% > > Ended Job = job_201009241059_0282 with errors > > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.MapRedTask > > > > tail /tmp/pradeepk/hive.log: > > 2010-09-24 13:29:01,888 WARN mapred.JobClient > (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser > for parsing the arguments. Applications should implement Tool for the same. > > 2010-09-24 13:29:01,903 WARN fs.FileSystem (FileSystem.java:fixName(153)) - > "wilbur21.labs.corp.sp1.yahoo.com:8020" is a deprecated filesystem name. Use > "hdfs://wilbur21.labs.corp.sp1.yahoo.com:8020/" instead. > > 2010-09-24 13:29:03,512 ERROR exec.MapRedTask > (SessionState.java:printError(277)) - Ended Job = job_201009241059_0282 with > errors > > 2010-09-24 13:29:03,537 ERROR ql.Driver (SessionState.java:printError(277)) > - FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.MapRedTask >
-
Re: Regression in trunk? (RE: Insert overwrite error using hive trunk)Ning Zhang 2010-09-27, 16:31
Can you do explain your query after setting the parameter?
On Sep 27, 2010, at 9:25 AM, Ashutosh Chauhan wrote: > I suspected the same. But, even after setting this property, second MR > job did get launched and then failed. > > Ashutosh > On Mon, Sep 27, 2010 at 09:25, Ning Zhang <[EMAIL PROTECTED]> wrote: >> I'm guessing this is due to the merge task (the 2nd MR job that merges small >> files together). You can try to 'set hive.merge.mapfiles=false;' before the >> query and see if it succeeded. >> If it is due to merge job, can you attach the plan and check the >> mapper/reducer task log and see what errors/exceptions are there? >> >> On Sep 27, 2010, at 9:10 AM, Pradeep Kamath wrote: >> >> Hi, >> >> Any help in debugging the issue I am seeing below will be greatly >> appreciated. Unless I am doing something wrong, this seems to be a >> regression in trunk. >> >> >> >> Thanks, >> >> Pradeep >> >> >> >> ________________________________ >> >> From: Pradeep Kamath [mailto:[EMAIL PROTECTED]] >> Sent: Friday, September 24, 2010 1:41 PM >> To: [EMAIL PROTECTED] >> Subject: Insert overwrite error using hive trunk >> >> >> >> Hi, >> >> I am trying to insert overwrite into a partitioned table reading data >> from a non partitioned table and seeing a failure in the second map reduce >> job – wonder if I am doing something wrong – any pointers appreciated (I am >> using latest trunk code against hadoop 20 cluster). Details below[1]. >> >> >> >> Thanks, >> >> Pradeep >> >> >> >> [1] >> >> Details: >> >> bin/hive -e "describe numbers_text;" >> >> col_name data_type comment >> >> id int None >> >> num int None >> >> >> >> bin/hive -e "describe numbers_text_part;" >> >> col_name data_type comment >> >> id int None >> >> num int None >> >> # Partition Information >> >> col_name data_type comment >> >> part string None >> >> >> >> bin/hive -e "select * from numbers_text;" >> >> 1 10 >> >> 2 20 >> >> >> >> bin/hive -e "insert overwrite table numbers_text_part partition(part='p1') >> select id, num from numbers_text;" >> >> Total MapReduce jobs = 2 >> >> Launching Job 1 out of 2 >> >> Number of reduce tasks is set to 0 since there's no reduce operator >> >> … >> >> 2010-09-24 13:28:55,649 Stage-1 map = 0%, reduce = 0% >> >> 2010-09-24 13:28:58,687 Stage-1 map = 100%, reduce = 0% >> >> 2010-09-24 13:29:01,726 Stage-1 map = 100%, reduce = 100% >> >> Ended Job = job_201009241059_0281 >> >> Ended Job = -1897439470, job is filtered out (removed at runtime). >> >> Launching Job 2 out of 2 >> >> Number of reduce tasks is set to 0 since there's no reduce operator >> >> … >> >> 2010-09-24 13:29:03,504 Stage-2 map = 100%, reduce = 100% >> >> Ended Job = job_201009241059_0282 with errors >> >> FAILED: Execution Error, return code 2 from >> org.apache.hadoop.hive.ql.exec.MapRedTask >> >> >> >> tail /tmp/pradeepk/hive.log: >> >> 2010-09-24 13:29:01,888 WARN mapred.JobClient >> (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser >> for parsing the arguments. Applications should implement Tool for the same. >> >> 2010-09-24 13:29:01,903 WARN fs.FileSystem (FileSystem.java:fixName(153)) - >> "wilbur21.labs.corp.sp1.yahoo.com:8020" is a deprecated filesystem name. Use >> "hdfs://wilbur21.labs.corp.sp1.yahoo.com:8020/" instead. >> >> 2010-09-24 13:29:03,512 ERROR exec.MapRedTask >> (SessionState.java:printError(277)) - Ended Job = job_201009241059_0282 with >> errors >> >> 2010-09-24 13:29:03,537 ERROR ql.Driver (SessionState.java:printError(277)) >> - FAILED: Execution Error, return code 2 from >> org.apache.hadoop.hive.ql.exec.MapRedTask >>
-
Re: Regression in trunk? (RE: Insert overwrite error using hive trunk)yongqiang he 2010-09-27, 16:44
There is one ticket for insert overwrite local directory:
https://issues.apache.org/jira/browse/HIVE-1582 On Mon, Sep 27, 2010 at 9:31 AM, Ning Zhang <[EMAIL PROTECTED]> wrote: > Can you do explain your query after setting the parameter? > > > On Sep 27, 2010, at 9:25 AM, Ashutosh Chauhan wrote: > >> I suspected the same. But, even after setting this property, second MR >> job did get launched and then failed. >> >> Ashutosh >> On Mon, Sep 27, 2010 at 09:25, Ning Zhang <[EMAIL PROTECTED]> wrote: >>> I'm guessing this is due to the merge task (the 2nd MR job that merges small >>> files together). You can try to 'set hive.merge.mapfiles=false;' before the >>> query and see if it succeeded. >>> If it is due to merge job, can you attach the plan and check the >>> mapper/reducer task log and see what errors/exceptions are there? >>> >>> On Sep 27, 2010, at 9:10 AM, Pradeep Kamath wrote: >>> >>> Hi, >>> >>> Any help in debugging the issue I am seeing below will be greatly >>> appreciated. Unless I am doing something wrong, this seems to be a >>> regression in trunk. >>> >>> >>> >>> Thanks, >>> >>> Pradeep >>> >>> >>> >>> ________________________________ >>> >>> From: Pradeep Kamath [mailto:[EMAIL PROTECTED]] >>> Sent: Friday, September 24, 2010 1:41 PM >>> To: [EMAIL PROTECTED] >>> Subject: Insert overwrite error using hive trunk >>> >>> >>> >>> Hi, >>> >>> I am trying to insert overwrite into a partitioned table reading data >>> from a non partitioned table and seeing a failure in the second map reduce >>> job – wonder if I am doing something wrong – any pointers appreciated (I am >>> using latest trunk code against hadoop 20 cluster). Details below[1]. >>> >>> >>> >>> Thanks, >>> >>> Pradeep >>> >>> >>> >>> [1] >>> >>> Details: >>> >>> bin/hive -e "describe numbers_text;" >>> >>> col_name data_type comment >>> >>> id int None >>> >>> num int None >>> >>> >>> >>> bin/hive -e "describe numbers_text_part;" >>> >>> col_name data_type comment >>> >>> id int None >>> >>> num int None >>> >>> # Partition Information >>> >>> col_name data_type comment >>> >>> part string None >>> >>> >>> >>> bin/hive -e "select * from numbers_text;" >>> >>> 1 10 >>> >>> 2 20 >>> >>> >>> >>> bin/hive -e "insert overwrite table numbers_text_part partition(part='p1') >>> select id, num from numbers_text;" >>> >>> Total MapReduce jobs = 2 >>> >>> Launching Job 1 out of 2 >>> >>> Number of reduce tasks is set to 0 since there's no reduce operator >>> >>> … >>> >>> 2010-09-24 13:28:55,649 Stage-1 map = 0%, reduce = 0% >>> >>> 2010-09-24 13:28:58,687 Stage-1 map = 100%, reduce = 0% >>> >>> 2010-09-24 13:29:01,726 Stage-1 map = 100%, reduce = 100% >>> >>> Ended Job = job_201009241059_0281 >>> >>> Ended Job = -1897439470, job is filtered out (removed at runtime). >>> >>> Launching Job 2 out of 2 >>> >>> Number of reduce tasks is set to 0 since there's no reduce operator >>> >>> … >>> >>> 2010-09-24 13:29:03,504 Stage-2 map = 100%, reduce = 100% >>> >>> Ended Job = job_201009241059_0282 with errors >>> >>> FAILED: Execution Error, return code 2 from >>> org.apache.hadoop.hive.ql.exec.MapRedTask >>> >>> >>> >>> tail /tmp/pradeepk/hive.log: >>> >>> 2010-09-24 13:29:01,888 WARN mapred.JobClient >>> (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser >>> for parsing the arguments. Applications should implement Tool for the same. >>> >>> 2010-09-24 13:29:01,903 WARN fs.FileSystem (FileSystem.java:fixName(153)) - >>> "wilbur21.labs.corp.sp1.yahoo.com:8020" is a deprecated filesystem name. Use >>> "hdfs://wilbur21.labs.corp.sp1.yahoo.com:8020/" instead. >>> >>> 2010-09-24 13:29:03,512 ERROR exec.MapRedTask >>> (SessionState.java:printError(277)) - Ended Job = job_201009241059_0282 with
-
Re: Regression in trunk? (RE: Insert overwrite error using hive trunk)Pradeep Kamath 2010-09-27, 17:38
Here is the output of explain:
STAGE DEPENDENCIES: Stage-1 is a root stage Stage-4 depends on stages: Stage-1 , consists of Stage-3, Stage-2 Stage-3 Stage-0 depends on stages: Stage-3, Stage-2 Stage-2 STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: numbers_text TableScan alias: numbers_text Select Operator expressions: expr: id type: int expr: num type: int outputColumnNames: _col0, _col1 File Output Operator compressed: false GlobalTableId: 1 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: numbers_text_part Stage: Stage-4 Conditional Operator Stage: Stage-3 Move Operator files: hdfs directory: true destination: hdfs://wilbur21.labs.corp.sp1.yahoo.com/tmp/hive-pradeepk/hive_2010-09-27_10-37-06_724_1678373180997754320/-ext-10000 Stage: Stage-0 Move Operator tables: partition: part p1 replace: true table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: numbers_text_part Stage: Stage-2 Map Reduce Alias -> Map Operator Tree: hdfs://wilbur21.labs.corp.sp1.yahoo.com/tmp/hive-pradeepk/hive_2010-09-27_10-37-06_724_1678373180997754320/-ext-10002 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: numbers_text_part yongqiang he wrote: > There is one ticket for insert overwrite local directory: > https://issues.apache.org/jira/browse/HIVE-1582 > > On Mon, Sep 27, 2010 at 9:31 AM, Ning Zhang <[EMAIL PROTECTED]> wrote: > >> Can you do explain your query after setting the parameter? >> >> >> On Sep 27, 2010, at 9:25 AM, Ashutosh Chauhan wrote: >> >> >>> I suspected the same. But, even after setting this property, second MR >>> job did get launched and then failed. >>> >>> Ashutosh >>> On Mon, Sep 27, 2010 at 09:25, Ning Zhang <[EMAIL PROTECTED]> wrote: >>> >>>> I'm guessing this is due to the merge task (the 2nd MR job that merges small >>>> files together). You can try to 'set hive.merge.mapfiles=false;' before the >>>> query and see if it succeeded. >>>> If it is due to merge job, can you attach the plan and check the >>>> mapper/reducer task log and see what errors/exceptions are there? >>>> >>>> On Sep 27, 2010, at 9:10 AM, Pradeep Kamath wrote: >>>> >>>> Hi, >>>> >>>> Any help in debugging the issue I am seeing below will be greatly >>>> appreciated. Unless I am doing something wrong, this seems to be a >>>> regression in trunk. >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Pradeep >>>> >>>> >>>> >>>> ________________________________ >>>> >>>> From: Pradeep Kamath [mailto:[EMAIL PROTECTED]] >>>> Sent: Friday, September 24, 2010 1:41 PM >>>> To: [EMAIL PROTECTED] >>>> Subject: Insert overwrite error using hive trunk >>>> >>>> >>>> >>>> Hi, >>>> >>>> I am trying to insert overwrite into a partitioned table reading data >>>> from a non partitioned table and seeing a failure in the second map reduce >>>> job � wonder if I am doing something wrong � any pointers appreciated (I am >>>> using latest trunk code against hadoop 20 cluster). Details below[1].
-
Re: Regression in trunk? (RE: Insert overwrite error using hive trunk)Ning Zhang 2010-09-27, 17:52
This clearly indicate the merge still happens due to the conditional task. Can you double check if the parameter is set (hive.merge.mapfiles).
Also if you can also revert it back to use the old map-reduce merging (rather than using CombineHiveInputFormat for map-only merging) by setting hive.mergejob.maponly=false. I'm also curious why CombineHiveInputFormat failed in environment, can you also check your task log and see what errors are there (without changing all the above parameters)? On Sep 27, 2010, at 10:38 AM, Pradeep Kamath wrote: > Here is the output of explain: > > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-4 depends on stages: Stage-1 , consists of Stage-3, Stage-2 > Stage-3 > Stage-0 depends on stages: Stage-3, Stage-2 > Stage-2 > > STAGE PLANS: > Stage: Stage-1 > Map Reduce > Alias -> Map Operator Tree: > numbers_text > TableScan > alias: numbers_text > Select Operator > expressions: > expr: id > type: int > expr: num > type: int > outputColumnNames: _col0, _col1 > File Output Operator > compressed: false > GlobalTableId: 1 > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > name: numbers_text_part > > Stage: Stage-4 > Conditional Operator > > Stage: Stage-3 > Move Operator > files: > hdfs directory: true > destination: hdfs://wilbur21.labs.corp.sp1.yahoo.com/tmp/hive-pradeepk/hive_2010-09-27_10-37-06_724_1678373180997754320/-ext-10000 > > Stage: Stage-0 > Move Operator > tables: > partition: > part p1 > replace: true > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > name: numbers_text_part > > Stage: Stage-2 > Map Reduce > Alias -> Map Operator Tree: > hdfs://wilbur21.labs.corp.sp1.yahoo.com/tmp/hive-pradeepk/hive_2010-09-27_10-37-06_724_1678373180997754320/-ext-10002 > File Output Operator > compressed: false > GlobalTableId: 0 > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > name: numbers_text_part > > > yongqiang he wrote: >> There is one ticket for insert overwrite local directory: >> https://issues.apache.org/jira/browse/HIVE-1582 >> >> On Mon, Sep 27, 2010 at 9:31 AM, Ning Zhang <[EMAIL PROTECTED]> wrote: >> >>> Can you do explain your query after setting the parameter? >>> >>> >>> On Sep 27, 2010, at 9:25 AM, Ashutosh Chauhan wrote: >>> >>> >>>> I suspected the same. But, even after setting this property, second MR >>>> job did get launched and then failed. >>>> >>>> Ashutosh >>>> On Mon, Sep 27, 2010 at 09:25, Ning Zhang <[EMAIL PROTECTED]> wrote: >>>> >>>>> I'm guessing this is due to the merge task (the 2nd MR job that merges small >>>>> files together). You can try to 'set hive.merge.mapfiles=false;' before the >>>>> query and see if it succeeded. >>>>> If it is due to merge job, can you attach the plan and check the >>>>> mapper/reducer task log and see what errors/exceptions are there? >>>>> >>>>> On Sep 27, 2010, at 9:10 AM, Pradeep Kamath wrote: >>>>> >>>>> Hi, >>>>> >>>>> Any help in debugging the issue I am seeing below will be greatly >>>>> appreciated. Unless I am doing something wrong, this seems to be a >
-
RE: Regression in trunk? (RE: Insert overwrite error using hive trunk)Pradeep Kamath 2010-09-27, 18:22
Here are the settings:
bin/hive -e "set;" | grep hive.merge 10/09/27 11:15:36 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively Hive history file=/tmp/pradeepk/hive_job_log_pradeepk_201009271115_1683572284.txt hive.merge.mapfiles=true hive.merge.mapredfiles=false hive.merge.size.per.task=256000000 hive.merge.smallfiles.avgsize=16000000 hive.mergejob.maponly=true (BTW these seem to be the defaults since I am not setting anything specifically for merging files) I tried your suggestion of setting hive.mergejob.maponly to false, but still see the same error (no tasks are launched and the job fails - this is the same with or without the change below) [pradeepk@chargesize:/tmp/hive-svn/trunk/build/dist]bin/hive -e "set hive.mergejob.maponly=false; insert overwrite table numbers_text_part partition(part='p1') select id, num from numbers_text;" On the console output I also see: ... 2010-09-27 11:16:57,827 Stage-1 map = 100%, reduce = 0% 2010-09-27 11:17:00,859 Stage-1 map = 100%, reduce = 100% Ended Job = job_201009251752_1335 Ended Job = 1862840305, job is filtered out (removed at runtime). Launching Job 2 out of 2 Any pointers much appreciated! Thanks, Pradeep -----Original Message----- From: Ning Zhang [mailto:[EMAIL PROTECTED]] Sent: Monday, September 27, 2010 10:53 AM To: <[EMAIL PROTECTED]> Subject: Re: Regression in trunk? (RE: Insert overwrite error using hive trunk) This clearly indicate the merge still happens due to the conditional task. Can you double check if the parameter is set (hive.merge.mapfiles). Also if you can also revert it back to use the old map-reduce merging (rather than using CombineHiveInputFormat for map-only merging) by setting hive.mergejob.maponly=false. I'm also curious why CombineHiveInputFormat failed in environment, can you also check your task log and see what errors are there (without changing all the above parameters)? On Sep 27, 2010, at 10:38 AM, Pradeep Kamath wrote: > Here is the output of explain: > > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-4 depends on stages: Stage-1 , consists of Stage-3, Stage-2 > Stage-3 > Stage-0 depends on stages: Stage-3, Stage-2 > Stage-2 > > STAGE PLANS: > Stage: Stage-1 > Map Reduce > Alias -> Map Operator Tree: > numbers_text > TableScan > alias: numbers_text > Select Operator > expressions: > expr: id > type: int > expr: num > type: int > outputColumnNames: _col0, _col1 > File Output Operator > compressed: false > GlobalTableId: 1 > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > name: numbers_text_part > > Stage: Stage-4 > Conditional Operator > > Stage: Stage-3 > Move Operator > files: > hdfs directory: true > destination: hdfs://wilbur21.labs.corp.sp1.yahoo.com/tmp/hive-pradeepk/hive_2010-09-27_10-37-06_724_1678373180997754320/-ext-10000 > > Stage: Stage-0 > Move Operator > tables: > partition: > part p1 > replace: true > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > name: numbers_text_part > > Stage: Stage-2 > Map Reduce > Alias -> Map Operator Tree:
-
Re: Regression in trunk? (RE: Insert overwrite error using hive trunk)Ning Zhang 2010-09-27, 18:33
This means it failed even with the previous map-reduce merge job. Without looking at the task log file, it's very hard to tell what happened.
A quick fix to do is to set hive.merge.mapfiles=false. On Sep 27, 2010, at 11:22 AM, Pradeep Kamath wrote: Here are the settings: bin/hive -e "set;" | grep hive.merge 10/09/27 11:15:36 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively Hive history file=/tmp/pradeepk/hive_job_log_pradeepk_201009271115_1683572284.txt hive.merge.mapfiles=true hive.merge.mapredfiles=false hive.merge.size.per.task=256000000 hive.merge.smallfiles.avgsize=16000000 hive.mergejob.maponly=true (BTW these seem to be the defaults since I am not setting anything specifically for merging files) I tried your suggestion of setting hive.mergejob.maponly to false, but still see the same error (no tasks are launched and the job fails - this is the same with or without the change below) [pradeepk@chargesize:/tmp/hive-svn/trunk/build/dist]bin/hive -e "set hive.mergejob.maponly=false; insert overwrite table numbers_text_part partition(part='p1') select id, num from numbers_text;" On the console output I also see: ... 2010-09-27 11:16:57,827 Stage-1 map = 100%, reduce = 0% 2010-09-27 11:17:00,859 Stage-1 map = 100%, reduce = 100% Ended Job = job_201009251752_1335 Ended Job = 1862840305, job is filtered out (removed at runtime). Launching Job 2 out of 2 Any pointers much appreciated! Thanks, Pradeep -----Original Message----- From: Ning Zhang [mailto:[EMAIL PROTECTED]] Sent: Monday, September 27, 2010 10:53 AM To: <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Subject: Re: Regression in trunk? (RE: Insert overwrite error using hive trunk) This clearly indicate the merge still happens due to the conditional task. Can you double check if the parameter is set (hive.merge.mapfiles). Also if you can also revert it back to use the old map-reduce merging (rather than using CombineHiveInputFormat for map-only merging) by setting hive.mergejob.maponly=false. I'm also curious why CombineHiveInputFormat failed in environment, can you also check your task log and see what errors are there (without changing all the above parameters)? On Sep 27, 2010, at 10:38 AM, Pradeep Kamath wrote: > Here is the output of explain: > > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-4 depends on stages: Stage-1 , consists of Stage-3, Stage-2 > Stage-3 > Stage-0 depends on stages: Stage-3, Stage-2 > Stage-2 > > STAGE PLANS: > Stage: Stage-1 > Map Reduce > Alias -> Map Operator Tree: > numbers_text > TableScan > alias: numbers_text > Select Operator > expressions: > expr: id > type: int > expr: num > type: int > outputColumnNames: _col0, _col1 > File Output Operator > compressed: false > GlobalTableId: 1 > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > name: numbers_text_part > > Stage: Stage-4 > Conditional Operator > > Stage: Stage-3 > Move Operator > files: > hdfs directory: true > destination: hdfs://wilbur21.labs.corp.sp1.yahoo.com/tmp/hive-pradeepk/hive_2010-09-27_10-37-06_724_1678373180997754320/-ext-10000 > > Stage: Stage-0 > Move Operator > tables: > partition: > part p1 > replace: true > table: > input format: org.apache.hadoop.mapred.TextInputFormat
-
RE: Regression in trunk? (RE: Insert overwrite error using hive trunk)Pradeep Kamath 2010-09-27, 19:33
Yes setting hive.merge.mapfiles=false caused the query to succeed. Unfortunately without this setting, there are no logs for tasks for the second job since they never get launced even. The failure is very quick after the second job is started and is even before any tasks launch. So I could not find any logs to get more messages. I am noticing this on trunk with the default set up - any settings I can set to get more information that can help?
Thanks, Pradeep ________________________________ From: Ning Zhang [mailto:[EMAIL PROTECTED]] Sent: Monday, September 27, 2010 11:34 AM To: <[EMAIL PROTECTED]> Subject: Re: Regression in trunk? (RE: Insert overwrite error using hive trunk) This means it failed even with the previous map-reduce merge job. Without looking at the task log file, it's very hard to tell what happened. A quick fix to do is to set hive.merge.mapfiles=false. On Sep 27, 2010, at 11:22 AM, Pradeep Kamath wrote: Here are the settings: bin/hive -e "set;" | grep hive.merge 10/09/27 11:15:36 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively Hive history file=/tmp/pradeepk/hive_job_log_pradeepk_201009271115_1683572284.txt hive.merge.mapfiles=true hive.merge.mapredfiles=false hive.merge.size.per.task=256000000 hive.merge.smallfiles.avgsize=16000000 hive.mergejob.maponly=true (BTW these seem to be the defaults since I am not setting anything specifically for merging files) I tried your suggestion of setting hive.mergejob.maponly to false, but still see the same error (no tasks are launched and the job fails - this is the same with or without the change below) [pradeepk@chargesize:/tmp/hive-svn/trunk/build/dist]bin/hive -e "set hive.mergejob.maponly=false; insert overwrite table numbers_text_part partition(part='p1') select id, num from numbers_text;" On the console output I also see: ... 2010-09-27 11:16:57,827 Stage-1 map = 100%, reduce = 0% 2010-09-27 11:17:00,859 Stage-1 map = 100%, reduce = 100% Ended Job = job_201009251752_1335 Ended Job = 1862840305, job is filtered out (removed at runtime). Launching Job 2 out of 2 Any pointers much appreciated! Thanks, Pradeep -----Original Message----- From: Ning Zhang [mailto:[EMAIL PROTECTED]] Sent: Monday, September 27, 2010 10:53 AM To: <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Subject: Re: Regression in trunk? (RE: Insert overwrite error using hive trunk) This clearly indicate the merge still happens due to the conditional task. Can you double check if the parameter is set (hive.merge.mapfiles). Also if you can also revert it back to use the old map-reduce merging (rather than using CombineHiveInputFormat for map-only merging) by setting hive.mergejob.maponly=false. I'm also curious why CombineHiveInputFormat failed in environment, can you also check your task log and see what errors are there (without changing all the above parameters)? On Sep 27, 2010, at 10:38 AM, Pradeep Kamath wrote: > Here is the output of explain: > > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-4 depends on stages: Stage-1 , consists of Stage-3, Stage-2 > Stage-3 > Stage-0 depends on stages: Stage-3, Stage-2 > Stage-2 > > STAGE PLANS: > Stage: Stage-1 > Map Reduce > Alias -> Map Operator Tree: > numbers_text > TableScan > alias: numbers_text > Select Operator > expressions: > expr: id > type: int > expr: num > type: int > outputColumnNames: _col0, _col1 > File Output Operator > compressed: false > GlobalTableId: 1 > table: > input format: org.apache.hadoop.mapred.TextInputFormat
-
RE: Regression in trunk? (RE: Insert overwrite error using hive trunk)Steven Wong 2010-09-27, 20:11
Try "hive -hiveconf hive.root.logger=DEBUG,DRFA -e ..." to get more context of the error.
From: Pradeep Kamath [mailto:[EMAIL PROTECTED]] Sent: Monday, September 27, 2010 12:34 PM To: [EMAIL PROTECTED] Subject: RE: Regression in trunk? (RE: Insert overwrite error using hive trunk) Yes setting hive.merge.mapfiles=false caused the query to succeed. Unfortunately without this setting, there are no logs for tasks for the second job since they never get launced even. The failure is very quick after the second job is started and is even before any tasks launch. So I could not find any logs to get more messages. I am noticing this on trunk with the default set up - any settings I can set to get more information that can help? Thanks, Pradeep ________________________________ From: Ning Zhang [mailto:[EMAIL PROTECTED]] Sent: Monday, September 27, 2010 11:34 AM To: <[EMAIL PROTECTED]> Subject: Re: Regression in trunk? (RE: Insert overwrite error using hive trunk) This means it failed even with the previous map-reduce merge job. Without looking at the task log file, it's very hard to tell what happened. A quick fix to do is to set hive.merge.mapfiles=false. On Sep 27, 2010, at 11:22 AM, Pradeep Kamath wrote: Here are the settings: bin/hive -e "set;" | grep hive.merge 10/09/27 11:15:36 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively Hive history file=/tmp/pradeepk/hive_job_log_pradeepk_201009271115_1683572284.txt hive.merge.mapfiles=true hive.merge.mapredfiles=false hive.merge.size.per.task=256000000 hive.merge.smallfiles.avgsize=16000000 hive.mergejob.maponly=true (BTW these seem to be the defaults since I am not setting anything specifically for merging files) I tried your suggestion of setting hive.mergejob.maponly to false, but still see the same error (no tasks are launched and the job fails - this is the same with or without the change below) [pradeepk@chargesize:/tmp/hive-svn/trunk/build/dist]bin/hive -e "set hive.mergejob.maponly=false; insert overwrite table numbers_text_part partition(part='p1') select id, num from numbers_text;" On the console output I also see: ... 2010-09-27 11:16:57,827 Stage-1 map = 100%, reduce = 0% 2010-09-27 11:17:00,859 Stage-1 map = 100%, reduce = 100% Ended Job = job_201009251752_1335 Ended Job = 1862840305, job is filtered out (removed at runtime). Launching Job 2 out of 2 Any pointers much appreciated! Thanks, Pradeep -----Original Message----- From: Ning Zhang [mailto:[EMAIL PROTECTED]] Sent: Monday, September 27, 2010 10:53 AM To: <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Subject: Re: Regression in trunk? (RE: Insert overwrite error using hive trunk) This clearly indicate the merge still happens due to the conditional task. Can you double check if the parameter is set (hive.merge.mapfiles). Also if you can also revert it back to use the old map-reduce merging (rather than using CombineHiveInputFormat for map-only merging) by setting hive.mergejob.maponly=false. I'm also curious why CombineHiveInputFormat failed in environment, can you also check your task log and see what errors are there (without changing all the above parameters)? On Sep 27, 2010, at 10:38 AM, Pradeep Kamath wrote: > Here is the output of explain: > > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-4 depends on stages: Stage-1 , consists of Stage-3, Stage-2 > Stage-3 > Stage-0 depends on stages: Stage-3, Stage-2 > Stage-2 > > STAGE PLANS: > Stage: Stage-1 > Map Reduce > Alias -> Map Operator Tree: > numbers_text > TableScan > alias: numbers_text > Select Operator > expressions: > expr: id > type: int
-
Re: Regression in trunk? (RE: Insert overwrite error using hive trunk)Ning Zhang 2010-09-27, 20:37
>From the error info, it seems the 2nd job has been launched and failed. So I'm assuming there are map tasks started? If not, you can find the error message in the client log file /tmp/<userid>/hive.log at the machine running hive after setting the hive.root.logger property Steven mentioned.
On Sep 27, 2010, at 1:11 PM, Steven Wong wrote: >>>>> 2010-09-24 13:29:03,504 Stage-2 map = 100%, reduce = 100% >>>>> >>>>> Ended Job = job_201009241059_0282 with errors >>>>> >>>>> FAILED: Execution Error, return code 2 from >>>>> org.apache.hadoop.hive.ql.exec.MapRedTask >>>>> >>>>>
-
Re: Regression in trunk? (RE: Insert overwrite error using hive trunk)Pradeep Kamath 2010-09-28, 00:58
Here is some relevant stuff from /tmp/pradeepk/hive.logs - can't make
much out of it: 2010-09-27 17:40:01,081 INFO exec.MapRedTask (SessionState.java:printInfo(268)) - Starting Job = job_201009251752_1341, Tracking URL = http://<hostname>:50030/jobdetails.jsp?jobid=job_201009251752_1341 2010-09-27 17:40:01,081 INFO exec.MapRedTask (SessionState.java:printInfo(268)) - Kill Command = /homes/pradeepk/hadoopcluster/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=<hostname>:50020 -kill job_201009251752_1341 2010-09-27 17:40:01,081 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #129 2010-09-27 17:40:01,083 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #129 2010-09-27 17:40:01,083 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobStatus 2 2010-09-27 17:40:02,086 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #130 2010-09-27 17:40:02,090 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #130 2010-09-27 17:40:02,091 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobStatus 5 2010-09-27 17:40:02,092 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #131 2010-09-27 17:40:02,093 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #131 2010-09-27 17:40:02,094 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobStatus 2 2010-09-27 17:40:02,094 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #132 2010-09-27 17:40:02,096 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #132 2010-09-27 17:40:02,096 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobProfile 2 2010-09-27 17:40:02,096 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #133 2010-09-27 17:40:02,100 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #133 2010-09-27 17:40:02,100 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobCounters 4 2010-09-27 17:40:02,101 DEBUG mapred.Counters (Counters.java:<init>(151)) - Creating group org.apache.hadoop.hive.ql.exec.Operator$ProgressCounter with nothing 2010-09-27 17:40:02,101 DEBUG mapred.Counters (Counters.java:getCounterForName(277)) - Adding CREATED_FILES 2010-09-27 17:40:02,103 INFO exec.MapRedTask (SessionState.java:printInfo(268)) - 2010-09-27 17:40:02,103 Stage-2 map = 100%, reduce = 100% 2010-09-27 17:40:02,104 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #134 2010-09-27 17:40:02,105 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #134 2010-09-27 17:40:02,106 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobStatus 2 2010-09-27 17:40:02,106 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #135 2010-09-27 17:40:02,108 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #135 2010-09-27 17:40:02,108 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobCounters 2 2010-09-27 17:40:02,109 DEBUG mapred.Counters (Counters.java:<init>(151)) - Creating group org.apache.hadoop.hive.ql.exec.Operator$ProgressCounter with nothing 2010-09-27 17:40:02,109 DEBUG mapred.Counters (Counters.java:getCounterForName(277)) - Adding CREATED_FILES 2010-09-27 17:40:02,109 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #136 2010-09-27 17:40:02,111 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #136 2010-09-27 17:40:02,111 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobStatus 2 2010-09-27 17:40:02,112 ERROR exec.MapRedTask (SessionState.java:printError(277)) - Ended Job = job_201009251752_1341 with errors Ning Zhang wrote:
-
Re: Regression in trunk? (RE: Insert overwrite error using hive trunk)Amareshwari Sri Ramadasu 2010-09-28, 08:03
Pradeep, you might be hitting HADOOP-5759 and the job is not getting initialized at all. Look in JobTracker logs for the jobid to confirm the same.
On 9/28/10 6:28 AM, "Pradeep Kamath" <[EMAIL PROTECTED]> wrote: Here is some relevant stuff from /tmp/pradeepk/hive.logs - can't make much out of it: 2010-09-27 17:40:01,081 INFO exec.MapRedTask (SessionState.java:printInfo(268)) - Starting Job = job_201009251752_1341, Tracking URL = http://<hostname>:50030/jobdetails.jsp?jobid=job_201009251752_1341 2010-09-27 17:40:01,081 INFO exec.MapRedTask (SessionState.java:printInfo(268)) - Kill Command = /homes/pradeepk/hadoopcluster/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=<hostname>:50020 -kill job_201009251752_1341 2010-09-27 17:40:01,081 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #129 2010-09-27 17:40:01,083 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #129 2010-09-27 17:40:01,083 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobStatus 2 2010-09-27 17:40:02,086 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #130 2010-09-27 17:40:02,090 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #130 2010-09-27 17:40:02,091 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobStatus 5 2010-09-27 17:40:02,092 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #131 2010-09-27 17:40:02,093 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #131 2010-09-27 17:40:02,094 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobStatus 2 2010-09-27 17:40:02,094 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #132 2010-09-27 17:40:02,096 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #132 2010-09-27 17:40:02,096 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobProfile 2 2010-09-27 17:40:02,096 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #133 2010-09-27 17:40:02,100 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #133 2010-09-27 17:40:02,100 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobCounters 4 2010-09-27 17:40:02,101 DEBUG mapred.Counters (Counters.java:<init>(151)) - Creating group org.apache.hadoop.hive.ql.exec.Operator$ProgressCounter with nothing 2010-09-27 17:40:02,101 DEBUG mapred.Counters (Counters.java:getCounterForName(277)) - Adding CREATED_FILES 2010-09-27 17:40:02,103 INFO exec.MapRedTask (SessionState.java:printInfo(268)) - 2010-09-27 17:40:02,103 Stage-2 map = 100%, reduce = 100% 2010-09-27 17:40:02,104 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #134 2010-09-27 17:40:02,105 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #134 2010-09-27 17:40:02,106 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobStatus 2 2010-09-27 17:40:02,106 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #135 2010-09-27 17:40:02,108 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #135 2010-09-27 17:40:02,108 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobCounters 2 2010-09-27 17:40:02,109 DEBUG mapred.Counters (Counters.java:<init>(151)) - Creating group org.apache.hadoop.hive.ql.exec.Operator$ProgressCounter with nothing 2010-09-27 17:40:02,109 DEBUG mapred.Counters (Counters.java:getCounterForName(277)) - Adding CREATED_FILES 2010-09-27 17:40:02,109 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #136 2010-09-27 17:40:02,111 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #136 2010-09-27 17:40:02,111 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobStatus 2 2010-09-27 17:40:02,112 ERROR exec.MapRedTask (SessionState.java:printError(277)) - Ended Job = job_201009251752_1341 with errors Ning Zhang wrote: On Sep 27, 2010, at 1:11 PM, Steven Wong wrote:
-
Re: Regression in trunk? (RE: Insert overwrite error using hive trunk)Pradeep Kamath 2010-09-28, 16:31
With "hive -hiveconf hive.root.logger=DEBUG,DRFA -e ... "
/tmp/<username>/hive.log seems to have pretty detailed log messages including debug msgs. I don't see the "initialization failed" message and the stack trace mentioned in HADOOP-5759 - is there any other place I need to check. On the UI I only see map task in pending state and no further information (this is with hadoop-0.20.1). With a more recent hadoop I see no tasks launched at all. This used to work a month before - am wondering if any changes in hive caused this. Thanks, Pradeep Amareshwari Sri Ramadasu wrote: > Pradeep, you might be hitting HADOOP-5759 and the job is not getting > initialized at all. Look in JobTracker logs for the jobid to confirm > the same. > > On 9/28/10 6:28 AM, "Pradeep Kamath" <[EMAIL PROTECTED]> wrote: > > Here is some relevant stuff from /tmp/pradeepk/hive.logs - can't > make much out of it: > > 2010-09-27 17:40:01,081 INFO exec.MapRedTask > (SessionState.java:printInfo(268)) - Starting Job > job_201009251752_1341, Tracking URL > http://<hostname>:50030/jobdetails.jsp?jobid=job_201009251752_1341 > <http://%3Chostname%3E:50030/jobdetails.jsp?jobid=job_201009251752_1341> > 2010-09-27 17:40:01,081 INFO exec.MapRedTask > (SessionState.java:printInfo(268)) - Kill Command > /homes/pradeepk/hadoopcluster/hadoop/bin/../bin/hadoop job > -Dmapred.job.tracker=<hostname>:50020 -kill job_201009251752_1341 > 2010-09-27 17:40:01,081 DEBUG ipc.Client > (Client.java:sendParam(469)) - IPC Client (47) connection to > <hostname>/216.252.118.203:50020 from pradeepk sending #129 > 2010-09-27 17:40:01,083 DEBUG ipc.Client > (Client.java:receiveResponse(504)) - IPC Client (47) connection to > <hostname>/216.252.118.203:50020 from pradeepk got value #129 > 2010-09-27 17:40:01,083 DEBUG ipc.RPC (RPC.java:invoke(225)) - > Call: getJobStatus 2 > 2010-09-27 17:40:02,086 DEBUG ipc.Client > (Client.java:sendParam(469)) - IPC Client (47) connection to > <hostname>/216.252.118.203:50020 from pradeepk sending #130 > 2010-09-27 17:40:02,090 DEBUG ipc.Client > (Client.java:receiveResponse(504)) - IPC Client (47) connection to > <hostname>/216.252.118.203:50020 from pradeepk got value #130 > 2010-09-27 17:40:02,091 DEBUG ipc.RPC (RPC.java:invoke(225)) - > Call: getJobStatus 5 > 2010-09-27 17:40:02,092 DEBUG ipc.Client > (Client.java:sendParam(469)) - IPC Client (47) connection to > <hostname>/216.252.118.203:50020 from pradeepk sending #131 > 2010-09-27 17:40:02,093 DEBUG ipc.Client > (Client.java:receiveResponse(504)) - IPC Client (47) connection to > <hostname>/216.252.118.203:50020 from pradeepk got value #131 > 2010-09-27 17:40:02,094 DEBUG ipc.RPC (RPC.java:invoke(225)) - > Call: getJobStatus 2 > 2010-09-27 17:40:02,094 DEBUG ipc.Client > (Client.java:sendParam(469)) - IPC Client (47) connection to > <hostname>/216.252.118.203:50020 from pradeepk sending #132 > 2010-09-27 17:40:02,096 DEBUG ipc.Client > (Client.java:receiveResponse(504)) - IPC Client (47) connection to > <hostname>/216.252.118.203:50020 from pradeepk got value #132 > 2010-09-27 17:40:02,096 DEBUG ipc.RPC (RPC.java:invoke(225)) - > Call: getJobProfile 2 > 2010-09-27 17:40:02,096 DEBUG ipc.Client > (Client.java:sendParam(469)) - IPC Client (47) connection to > <hostname>/216.252.118.203:50020 from pradeepk sending #133 > 2010-09-27 17:40:02,100 DEBUG ipc.Client > (Client.java:receiveResponse(504)) - IPC Client (47) connection to > <hostname>/216.252.118.203:50020 from pradeepk got value #133 > 2010-09-27 17:40:02,100 DEBUG ipc.RPC (RPC.java:invoke(225)) - > Call: getJobCounters 4 > 2010-09-27 17:40:02,101 DEBUG mapred.Counters > (Counters.java:<init>(151)) - Creating group > org.apache.hadoop.hive.ql.exec.Operator$ProgressCounter with nothing > 2010-09-27 17:40:02,101 DEBUG mapred.Counters
-
RE: Regression in trunk? (RE: Insert overwrite error using hive trunk)Pradeep Kamath 2010-09-28, 17:30
Should I open a jira for this? So far it seems like a regression.
Pradeep ________________________________ From: Pradeep Kamath [mailto:[EMAIL PROTECTED]] Sent: Tuesday, September 28, 2010 9:32 AM To: [EMAIL PROTECTED] Subject: Re: Regression in trunk? (RE: Insert overwrite error using hive trunk) With "hive -hiveconf hive.root.logger=DEBUG,DRFA -e ... " /tmp/<username>/hive.log seems to have pretty detailed log messages including debug msgs. I don't see the "initialization failed" message and the stack trace mentioned in HADOOP-5759 - is there any other place I need to check. On the UI I only see map task in pending state and no further information (this is with hadoop-0.20.1). With a more recent hadoop I see no tasks launched at all. This used to work a month before - am wondering if any changes in hive caused this. Thanks, Pradeep Amareshwari Sri Ramadasu wrote: Pradeep, you might be hitting HADOOP-5759 and the job is not getting initialized at all. Look in JobTracker logs for the jobid to confirm the same. On 9/28/10 6:28 AM, "Pradeep Kamath" <[EMAIL PROTECTED]> wrote: Here is some relevant stuff from /tmp/pradeepk/hive.logs - can't make much out of it: 2010-09-27 17:40:01,081 INFO exec.MapRedTask (SessionState.java:printInfo(268)) - Starting Job = job_201009251752_1341, Tracking URL = http://<hostname>:50030/jobdetails.jsp?jobid=job_201009251752_1341<http://%3Chostname%3E:50030/jobdetails.jsp?jobid=job_201009251752_1341> 2010-09-27 17:40:01,081 INFO exec.MapRedTask (SessionState.java:printInfo(268)) - Kill Command = /homes/pradeepk/hadoopcluster/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=<hostname>:50020 -kill job_201009251752_1341 2010-09-27 17:40:01,081 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #129 2010-09-27 17:40:01,083 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #129 2010-09-27 17:40:01,083 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobStatus 2 2010-09-27 17:40:02,086 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #130 2010-09-27 17:40:02,090 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #130 2010-09-27 17:40:02,091 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobStatus 5 2010-09-27 17:40:02,092 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #131 2010-09-27 17:40:02,093 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #131 2010-09-27 17:40:02,094 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobStatus 2 2010-09-27 17:40:02,094 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #132 2010-09-27 17:40:02,096 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #132 2010-09-27 17:40:02,096 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobProfile 2 2010-09-27 17:40:02,096 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #133 2010-09-27 17:40:02,100 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #133 2010-09-27 17:40:02,100 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobCounters 4 2010-09-27 17:40:02,101 DEBUG mapred.Counters (Counters.java:<init>(151)) - Creating group org.apache.hadoop.hive.ql.exec.Operator$ProgressCounter with nothing 2010-09-27 17:40:02,101 DEBUG mapred.Counters (Counters.java:getCounterForName(277)) - Adding CREATED_FILES 2010-09-27 17:40:02,103 INFO exec.MapRedTask (SessionState.java:printInfo(268)) - 2010-09-27 17:40:02,103 Stage-2 map = 100%, reduce = 100% 2010-09-27 17:40:02,104 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #134 2010-09-27 17:40:02,105 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #134 2010-09-27 17:40:02,106 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobStatus 2 2010-09-27 17:40:02,106 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #135 2010-09-27 17:40:02,108 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #135 2010-09-27 17:40:02,108 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobCounters 2 2010-09-27 17:40:02,109 DEBUG mapred.Counters (Counters.java:<init>(151)) - Creating group org.apache.hadoop.hive.ql.exec.Operator$ProgressCounter with nothing 2010-09-27 17:40:02,109 DEBUG mapred.Counters (Counters.java:getCounterForName(277)) - Adding CREATED_FILES 2010-09-27 17:40:02,109 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #136 2010-09-27 17:40:02,111 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #136 2010-09-27 17:40:02,111 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobStatus 2 2010-09-27 17:40:02,112 ERROR exec.MapRedTask (SessionState.java:printError(277)) - Ended Job = job_201009251752_1341 with errors Ning Zhang wrote: On Sep 27, 2010, at 1:11 PM, Steven Wong wrote:
-
Re: Regression in trunk? (RE: Insert overwrite error using hive trunk)Ning Zhang 2010-09-28, 18:23
Prodeep, can you open the tracking URL printed out from the log and click through to the task log? The real error should be printed over there. The link may be expired so you need to rerun the query and click on the new one.
I'm suspecting the error is due to the fact that CombineHiveInputFormat with the Hadoop version you are using. Again, the first thing is to check the task log through the tracking URL. Tracking URL = http://<http:/><hostname>:50030/jobdetails.jsp?jobid=job_201009251752_1341 On Sep 27, 2010, at 5:58 PM, Pradeep Kamath wrote: Here is some relevant stuff from /tmp/pradeepk/hive.logs - can't make much out of it: 2010-09-27 17:40:01,081 INFO exec.MapRedTask (SessionState.java:printInfo(268)) - Starting Job = job_201009251752_1341, Tracking URL = http://<http:/><hostname>:50030/jobdetails.jsp?jobid=job_201009251752_1341 2010-09-27 17:40:01,081 INFO exec.MapRedTask (SessionState.java:printInfo(268)) - Kill Command = /homes/pradeepk/hadoopcluster/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=<hostname>:50020 -kill job_201009251752_1341 2010-09-27 17:40:01,081 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #129 2010-09-27 17:40:01,083 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #129 2010-09-27 17:40:01,083 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobStatus 2 2010-09-27 17:40:02,086 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #130 2010-09-27 17:40:02,090 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #130 2010-09-27 17:40:02,091 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobStatus 5 2010-09-27 17:40:02,092 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #131 2010-09-27 17:40:02,093 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #131 2010-09-27 17:40:02,094 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobStatus 2 2010-09-27 17:40:02,094 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #132 2010-09-27 17:40:02,096 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #132 2010-09-27 17:40:02,096 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobProfile 2 2010-09-27 17:40:02,096 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #133 2010-09-27 17:40:02,100 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #133 2010-09-27 17:40:02,100 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobCounters 4 2010-09-27 17:40:02,101 DEBUG mapred.Counters (Counters.java:<init>(151)) - Creating group org.apache.hadoop.hive.ql.exec.Operator$ProgressCounter with nothing 2010-09-27 17:40:02,101 DEBUG mapred.Counters (Counters.java:getCounterForName(277)) - Adding CREATED_FILES 2010-09-27 17:40:02,103 INFO exec.MapRedTask (SessionState.java:printInfo(268)) - 2010-09-27 17:40:02,103 Stage-2 map = 100%, reduce = 100% 2010-09-27 17:40:02,104 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #134 2010-09-27 17:40:02,105 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #134 2010-09-27 17:40:02,106 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobStatus 2 2010-09-27 17:40:02,106 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #135 2010-09-27 17:40:02,108 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #135 2010-09-27 17:40:02,108 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobCounters 2 2010-09-27 17:40:02,109 DEBUG mapred.Counters (Counters.java:<init>(151)) - Creating group org.apache.hadoop.hive.ql.exec.Operator$ProgressCounter with nothing 2010-09-27 17:40:02,109 DEBUG mapred.Counters (Counters.java:getCounterForName(277)) - Adding CREATED_FILES 2010-09-27 17:40:02,109 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #136 2010-09-27 17:40:02,111 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #136 2010-09-27 17:40:02,111 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobStatus 2 2010-09-27 17:40:02,112 ERROR exec.MapRedTask (SessionState.java:printError(277)) - Ended Job = job_201009251752_1341 with errors Ning Zhang wrote: On Sep 27, 2010, at 1:11 PM, Steven Wong wrote:
-
RE: Regression in trunk? (RE: Insert overwrite error using hive trunk)Pradeep Kamath 2010-09-28, 19:30
Hi Ning, With hadoop-0.20.1 (apache release) on the UI (following the tracking URL) I only see a pending map task and when I click through fnally I see:
java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.hadoop.mapred.JobInProgress.getTaskInProgress(JobInProgress.java:2523) at org.apache.hadoop.mapred.taskdetails_jsp._jspService(taskdetails_jsp.java:118) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:324) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) I feel this is not a root cause exception. So I have not been able to find the root cause exception anywhere on the jobtracker UI. I pasted what I found in /tmp/<username>/hive.log and that wasn't very indicative either. Pradeep ________________________________ From: Ning Zhang [mailto:[EMAIL PROTECTED]] Sent: Tuesday, September 28, 2010 11:24 AM To: <[EMAIL PROTECTED]> Subject: Re: Regression in trunk? (RE: Insert overwrite error using hive trunk) Prodeep, can you open the tracking URL printed out from the log and click through to the task log? The real error should be printed over there. The link may be expired so you need to rerun the query and click on the new one. I'm suspecting the error is due to the fact that CombineHiveInputFormat with the Hadoop version you are using. Again, the first thing is to check the task log through the tracking URL. Tracking URL = http://<hostname>:50030/jobdetails.jsp?jobid=job_201009251752_1341 On Sep 27, 2010, at 5:58 PM, Pradeep Kamath wrote: Here is some relevant stuff from /tmp/pradeepk/hive.logs - can't make much out of it: 2010-09-27 17:40:01,081 INFO exec.MapRedTask (SessionState.java:printInfo(268)) - Starting Job = job_201009251752_1341, Tracking URL = http://<hostname>:50030/jobdetails.jsp?jobid=job_201009251752_1341 2010-09-27 17:40:01,081 INFO exec.MapRedTask (SessionState.java:printInfo(268)) - Kill Command = /homes/pradeepk/hadoopcluster/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=<hostname>:50020 -kill job_201009251752_1341 2010-09-27 17:40:01,081 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #129 2010-09-27 17:40:01,083 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #129 2010-09-27 17:40:01,083 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobStatus 2 2010-09-27 17:40:02,086 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #130 2010-09-27 17:40:02,090 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #130 2010-09-27 17:40:02,091 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobStatus 5 2010-09-27 17:40:02,092 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #131 2010-09-27 17:40:02,093 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #131 2010-09-27 17:40:02,094 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobStatus 2 2010-09-27 17:40:02,094 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #132 2010-09-27 17:40:02,096 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #132 2010-09-27 17:40:02,096 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobProfile 2 2010-09-27 17:40:02,096 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #133 2010-09-27 17:40:02,100 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #133 2010-09-27 17:40:02,100 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobCounters 4 2010-09-27 17:40:02,101 DEBUG mapred.Counters (Counters.java:<init>(151)) - Creating group org.apache.hadoop.hive.ql.exec.Operator$ProgressCounter with nothing 2010-09-27 17:40:02,101 DEBUG mapred.Counters (Counters.java:getCounterForName(277)) - Adding CREATED_FILES 2010-09-27 17:40:02,103 INFO exec.MapRedTask (SessionState.java:printInfo(268)) - 2010-09-27 17:40:02,103 Stage-2 map = 100%, reduce = 100% 2010-09-27 17:40:02,104 DEBUG ip
-
Re: Regression in trunk? (RE: Insert overwrite error using hive trunk)Ning Zhang 2010-09-28, 20:43
It's mostly like because of missing patches for CombineFileInputFormat from Hadoop. Can you try what Amareshwari suggested (adding HIVE-5759 patch) or try hadoop 0.20.2 (which contains HIVE-5759)? According to Dhruba at FB, we use haoop 0.20.0 and applied a number of patches from trunk (include all patches involves CombineFileInputFormat).
On Sep 28, 2010, at 12:30 PM, Pradeep Kamath wrote: Hi Ning, With hadoop-0.20.1 (apache release) on the UI (following the tracking URL) I only see a pending map task and when I click through fnally I see: java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.hadoop.mapred.JobInProgress.getTaskInProgress(JobInProgress.java:2523) at org.apache.hadoop.mapred.taskdetails_jsp._jspService(taskdetails_jsp.java:118) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:324) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) I feel this is not a root cause exception. So I have not been able to find the root cause exception anywhere on the jobtracker UI. I pasted what I found in /tmp/<username>/hive.log and that wasn’t very indicative either. Pradeep ________________________________ From: Ning Zhang [mailto:[EMAIL PROTECTED]] Sent: Tuesday, September 28, 2010 11:24 AM To: <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Subject: Re: Regression in trunk? (RE: Insert overwrite error using hive trunk) Prodeep, can you open the tracking URL printed out from the log and click through to the task log? The real error should be printed over there. The link may be expired so you need to rerun the query and click on the new one. I'm suspecting the error is due to the fact that CombineHiveInputFormat with the Hadoop version you are using. Again, the first thing is to check the task log through the tracking URL. Tracking URL = http://<http:/><hostname>:50030/jobdetails.jsp?jobid=job_201009251752_1341 On Sep 27, 2010, at 5:58 PM, Pradeep Kamath wrote: Here is some relevant stuff from /tmp/pradeepk/hive.logs - can't make much out of it: 2010-09-27 17:40:01,081 INFO exec.MapRedTask (SessionState.java:printInfo(268)) - Starting Job = job_201009251752_1341, Tracking URL = http://<http:/><hostname>:50030/jobdetails.jsp?jobid=job_201009251752_1341 2010-09-27 17:40:01,081 INFO exec.MapRedTask (SessionState.java:printInfo(268)) - Kill Command = /homes/pradeepk/hadoopcluster/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=<hostname>:50020 -kill job_201009251752_1341 2010-09-27 17:40:01,081 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #129 2010-09-27 17:40:01,083 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #129 2010-09-27 17:40:01,083 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobStatus 2 2010-09-27 17:40:02,086 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #130 2010-09-27 17:40:02,090 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #130 2010-09-27 17:40:02,091 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobStatus 5 2010-09-27 17:40:02,092 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #131 2010-09-27 17:40:02,093 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #131 2010-09-27 17:40:02,094 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobStatus 2 2010-09-27 17:40:02,094 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #132 2010-09-27 17:40:02,096 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #132 2010-09-27 17:40:02,096 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobProfile 2 2010-09-27 17:40:02,096 DEBUG ipc.Client (Client.java:sendParam(469)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk sending #133 2010-09-27 17:40:02,100 DEBUG ipc.Client (Client.java:receiveResponse(504)) - IPC Client (47) connection to <hostname>/216.252.118.203:50020 from pradeepk got value #133 2010-09-27 17:40:02,100 DEBUG ipc.RPC (RPC.java:invoke(225)) - Call: getJobCoun |