Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> NullPointerException in GenericUDTFExplode.process()


Copy link to this message
-
NullPointerException in GenericUDTFExplode.process()
Hi,

I think I may have run into a Hive bug.  And I'm not sure what's causing it
or how to work around it.

The reduce task log contains this exception:

<td><pre>java.io.IOException: java.lang.NullPointerException
    at
org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:227)
    at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.NullPointerException
    at
org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode.process(GenericUDTFExplode.java:70)
    at
org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:98)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
    at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:81)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
    at
org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:46)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
    at
org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:43)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
    at
org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:218)

This works fine for millions of rows of data, but the one row below causes
the whole job to fail.  Looking at the row, I don't see anything that
distinguishes it... if I knew what it was about the row that caused a
problem I could filter it out before hand.  I don't mind losing one row in a
million.

2010-08-05^A15^A^AUS^A1281022768^Af^A97^Aonline car insurance
quote^Aborderdisorder.com^A\N^A^A1076^B1216^B1480^B1481^B1493^B1496^B1497^B1504^B1509^B1686^B1724^B1729^B1819^B1829^B1906^B1995^B2018^B2025^B421^B426^B428^B433^B436^B449^B450^B452^B462^B508^B530^B-

The source table and query are:

CREATE TABLE IF NOT EXISTS tmp3 (
  dt                  STRING,
  hr                  STRING,
  fld1                  STRING,
  fld2             STRING,
  stamp               BIGINT,
  fld3             STRING,
  fld4             INT,
  rk     STRING,
  rd     STRING,
  rq      STRING,
  kl        ARRAY<String>,
  receiver_code_list  ARRAY<String>
)
ROW FORMAT DELIMITED
STORED AS SEQUENCEFILE;

-- The limit 88 below is so that the one bad row is included, if I limit to
87 it works without failure.
SELECT count(1)
FROM (select receiver_code_list from tmp3 limit 88) tmp5
LATERAL VIEW explode(receiver_code_list) rcl AS receiver_code;

Any tips on what is wrong, or how else I might go about debugging it would
be appreciated.  Or a way to have it skip rows that cause errors would be an
acceptable solution as well.

Thanks,
Marc
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB