Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> NullPointerException in GenericUDTFExplode.process()


Copy link to this message
-
Re: NullPointerException in GenericUDTFExplode.process()
Also wanted to mention that I'm using the Cloudera distribution of Hive
(0.5.0+20-2) on CentOS.

Marc

On Sun, Aug 8, 2010 at 7:33 PM, Marc Limotte <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I think I may have run into a Hive bug.  And I'm not sure what's causing it
> or how to work around it.
>
> The reduce task log contains this exception:
>
> <td><pre>java.io.IOException: java.lang.NullPointerException
>     at
> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:227)
>     at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Caused by: java.lang.NullPointerException
>     at
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode.process(GenericUDTFExplode.java:70)
>     at
> org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:98)
>     at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
>     at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
>     at
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:81)
>     at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
>     at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
>     at
> org.apache.hadoop.hive.ql.exec.LimitOperator.processOp(LimitOperator.java:46)
>     at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
>     at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
>     at
> org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:43)
>     at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
>     at
> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:218)
>
> This works fine for millions of rows of data, but the one row below causes
> the whole job to fail.  Looking at the row, I don't see anything that
> distinguishes it... if I knew what it was about the row that caused a
> problem I could filter it out before hand.  I don't mind losing one row in a
> million.
>
> 2010-08-05^A15^A^AUS^A1281022768^Af^A97^Aonline car insurance
> quote^Aborderdisorder.com^A\N^A^A1076^B1216^B1480^B1481^B1493^B1496^B1497^B1504^B1509^B1686^B1724^B1729^B1819^B1829^B1906^B1995^B2018^B2025^B421^B426^B428^B433^B436^B449^B450^B452^B462^B508^B530^B-
>
> The source table and query are:
>
> CREATE TABLE IF NOT EXISTS tmp3 (
>   dt                  STRING,
>   hr                  STRING,
>   fld1                  STRING,
>   fld2             STRING,
>   stamp               BIGINT,
>   fld3             STRING,
>   fld4             INT,
>   rk     STRING,
>   rd     STRING,
>   rq      STRING,
>   kl        ARRAY<String>,
>   receiver_code_list  ARRAY<String>
> )
> ROW FORMAT DELIMITED
> STORED AS SEQUENCEFILE;
>
> -- The limit 88 below is so that the one bad row is included, if I limit to
> 87 it works without failure.
> SELECT count(1)
> FROM (select receiver_code_list from tmp3 limit 88) tmp5
> LATERAL VIEW explode(receiver_code_list) rcl AS receiver_code;
>
> Any tips on what is wrong, or how else I might go about debugging it would
> be appreciated.  Or a way to have it skip rows that cause errors would be an
> acceptable solution as well.
>
> Thanks,
> Marc
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB