|
|
-
NullPointerException in GenericUDTFExplode.process()
Marc Limotte 2010-08-09, 02:33
Hi,
I think I may have run into a Hive bug. And I'm not sure what's causing it or how to work around it.
The reduce task log contains this exception:
<td><pre>java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:227) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode.process(GenericUDTFExplode.java:70) at org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:98) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:81) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) at org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:46) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:43) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:218)
This works fine for millions of rows of data, but the one row below causes the whole job to fail. Looking at the row, I don't see anything that distinguishes it... if I knew what it was about the row that caused a problem I could filter it out before hand. I don't mind losing one row in a million.
2010-08-05^A15^A^AUS^A1281022768^Af^A97^Aonline car insurance quote^Aborderdisorder.com^A\N^A^A1076^B1216^B1480^B1481^B1493^B1496^B1497^B1504^B1509^B1686^B1724^B1729^B1819^B1829^B1906^B1995^B2018^B2025^B421^B426^B428^B433^B436^B449^B450^B452^B462^B508^B530^B-
The source table and query are:
CREATE TABLE IF NOT EXISTS tmp3 ( dt STRING, hr STRING, fld1 STRING, fld2 STRING, stamp BIGINT, fld3 STRING, fld4 INT, rk STRING, rd STRING, rq STRING, kl ARRAY<String>, receiver_code_list ARRAY<String> ) ROW FORMAT DELIMITED STORED AS SEQUENCEFILE;
-- The limit 88 below is so that the one bad row is included, if I limit to 87 it works without failure. SELECT count(1) FROM (select receiver_code_list from tmp3 limit 88) tmp5 LATERAL VIEW explode(receiver_code_list) rcl AS receiver_code;
Any tips on what is wrong, or how else I might go about debugging it would be appreciated. Or a way to have it skip rows that cause errors would be an acceptable solution as well.
Thanks, Marc
-
RE: NullPointerException in GenericUDTFExplode.process()
Paul Yang 2010-08-09, 03:14
Seem like an issue that was patched already - can you check to see if the column that you are calling explode() with has any null values?
From: Marc Limotte [mailto:[EMAIL PROTECTED]] Sent: Sunday, August 08, 2010 7:33 PM To: [EMAIL PROTECTED] Subject: NullPointerException in GenericUDTFExplode.process()
Hi,
I think I may have run into a Hive bug. And I'm not sure what's causing it or how to work around it.
The reduce task log contains this exception: <td><pre>java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:227) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode.process(GenericUDTFExplode.java:70) at org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:98) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:81) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) at org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:46) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:43) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:218) This works fine for millions of rows of data, but the one row below causes the whole job to fail. Looking at the row, I don't see anything that distinguishes it... if I knew what it was about the row that caused a problem I could filter it out before hand. I don't mind losing one row in a million. 2010-08-05^A15^A^AUS^A1281022768^Af^A97^Aonline car insurance quote^Aborderdisorder.com^A\N^A^A1076^B1216^B1480^B1481^B1493^B1496^B1497^B1504^B1509^B1686^B1724^B1729^B1819^B1829^B1906^B1995^B2018^B2025^B421^B426^B428^B433^B436^B449^B450^B452^B462^B508^B530^B-
The source table and query are: CREATE TABLE IF NOT EXISTS tmp3 ( dt STRING, hr STRING, fld1 STRING, fld2 STRING, stamp BIGINT, fld3 STRING, fld4 INT, rk STRING, rd STRING, rq STRING, kl ARRAY<String>, receiver_code_list ARRAY<String> ) ROW FORMAT DELIMITED STORED AS SEQUENCEFILE;
-- The limit 88 below is so that the one bad row is included, if I limit to 87 it works without failure. SELECT count(1) FROM (select receiver_code_list from tmp3 limit 88) tmp5 LATERAL VIEW explode(receiver_code_list) rcl AS receiver_code;
Any tips on what is wrong, or how else I might go about debugging it would be appreciated. Or a way to have it skip rows that cause errors would be an acceptable solution as well.
Thanks, Marc
-
Re: NullPointerException in GenericUDTFExplode.process()
Marc Limotte 2010-08-09, 18:32
Hi Paul,
No nulls. I ensure that every row has at least one entry (a hyphen) before I split to create the list.
Marc
On Sun, Aug 8, 2010 at 8:14 PM, Paul Yang <[EMAIL PROTECTED]> wrote:
> Seem like an issue that was patched already – can you check to see if the > column that you are calling explode() with has any null values? > > > > *From:* Marc Limotte [mailto:[EMAIL PROTECTED]] > *Sent:* Sunday, August 08, 2010 7:33 PM > > *To:* [EMAIL PROTECTED] > *Subject:* NullPointerException in GenericUDTFExplode.process() > > > > Hi, > > I think I may have run into a Hive bug. And I'm not sure what's causing it > or how to work around it. > > The reduce task log contains this exception: > > <td><pre>java.io.IOException: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:227) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode.process(GenericUDTFExplode.java:70) > at > org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:98) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:81) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) > at > org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:46) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) > at > org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:43) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) > at > org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:218) > > This works fine for millions of rows of data, but the one row below causes > the whole job to fail. Looking at the row, I don't see anything that > distinguishes it... if I knew what it was about the row that caused a > problem I could filter it out before hand. I don't mind losing one row in a > million. > > 2010-08-05^A15^A^AUS^A1281022768^Af^A97^Aonline car insurance > quote^Aborderdisorder.com^A\N^A^A1076^B1216^B1480^B1481^B1493^B1496^B1497^B1504^B1509^B1686^B1724^B1729^B1819^B1829^B1906^B1995^B2018^B2025^B421^B426^B428^B433^B436^B449^B450^B452^B462^B508^B530^B- > > > The source table and query are: > > CREATE TABLE IF NOT EXISTS tmp3 ( > dt STRING, > hr STRING, > fld1 STRING, > fld2 STRING, > stamp BIGINT, > fld3 STRING, > fld4 INT, > rk STRING, > rd STRING, > rq STRING, > kl ARRAY<String>, > receiver_code_list ARRAY<String> > ) > ROW FORMAT DELIMITED > STORED AS SEQUENCEFILE; > > > > -- The limit 88 below is so that the one bad row is included, if I limit to > 87 it works without failure. > SELECT count(1) > FROM (select receiver_code_list from tmp3 limit 88) tmp5 > LATERAL VIEW explode(receiver_code_list) rcl AS receiver_code; > > > Any tips on what is wrong, or how else I might go about debugging it would > be appreciated. Or a way to have it skip rows that cause errors would be an > acceptable solution as well. > > Thanks, > Marc > >
-
Re: NullPointerException in GenericUDTFExplode.process()
Marc Limotte 2010-08-10, 00:54
Also wanted to mention that I'm using the Cloudera distribution of Hive (0.5.0+20-2) on CentOS.
Marc
On Sun, Aug 8, 2010 at 7:33 PM, Marc Limotte <[EMAIL PROTECTED]> wrote:
> Hi, > > I think I may have run into a Hive bug. And I'm not sure what's causing it > or how to work around it. > > The reduce task log contains this exception: > > <td><pre>java.io.IOException: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:227) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode.process(GenericUDTFExplode.java:70) > at > org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:98) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:81) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) > at > org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:46) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598) > at > org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:43) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386) > at > org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:218) > > This works fine for millions of rows of data, but the one row below causes > the whole job to fail. Looking at the row, I don't see anything that > distinguishes it... if I knew what it was about the row that caused a > problem I could filter it out before hand. I don't mind losing one row in a > million. > > 2010-08-05^A15^A^AUS^A1281022768^Af^A97^Aonline car insurance > quote^Aborderdisorder.com^A\N^A^A1076^B1216^B1480^B1481^B1493^B1496^B1497^B1504^B1509^B1686^B1724^B1729^B1819^B1829^B1906^B1995^B2018^B2025^B421^B426^B428^B433^B436^B449^B450^B452^B462^B508^B530^B- > > The source table and query are: > > CREATE TABLE IF NOT EXISTS tmp3 ( > dt STRING, > hr STRING, > fld1 STRING, > fld2 STRING, > stamp BIGINT, > fld3 STRING, > fld4 INT, > rk STRING, > rd STRING, > rq STRING, > kl ARRAY<String>, > receiver_code_list ARRAY<String> > ) > ROW FORMAT DELIMITED > STORED AS SEQUENCEFILE; > > -- The limit 88 below is so that the one bad row is included, if I limit to > 87 it works without failure. > SELECT count(1) > FROM (select receiver_code_list from tmp3 limit 88) tmp5 > LATERAL VIEW explode(receiver_code_list) rcl AS receiver_code; > > Any tips on what is wrong, or how else I might go about debugging it would > be appreciated. Or a way to have it skip rows that cause errors would be an > acceptable solution as well. > > Thanks, > Marc > > >
|
|