Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> null pointer error with a simple pig program


Copy link to this message
-
null pointer error with a simple pig program
the following code gave null pointer exception

---------------------------------------------------------------------------------------

rbl_raw = load 's3://mybucket/rbl-logs/{2013/03/06,2013/03/05}' AS
(line:chararray);

rbl = FOREACH rbl_raw GENERATE FLATTEN(loadrbl(line)) AS (x:chararray,
y:chararray);

seo_rbl = FILTER rbl BY x IS NOT NULL AND y == 'seo_google';

rbl1 = GROUP seo_rbl BY x;

STORE rbl1 INTO '/user/hadoop/blah'

-------------------------------------------------------------------------------
Pig Stack Trace
---------------
ERROR 2017: Internal error creating job configuration.

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
ERROR 2017: Internal error creating job configuration.
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:750)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:267)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:151)
        at org.apache.pig.PigServer.launchPlan(PigServer.java:1313)
        at
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1298)
        at org.apache.pig.PigServer.execute(PigServer.java:1288)
        at org.apache.pig.PigServer.executeBatch(PigServer.java:360)
        at
org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132)
        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
        at org.apache.pig.Main.run(Main.java:568)
        at org.apache.pig.Main.main(Main.java:114)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
Caused by: java.lang.NullPointerException
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:994)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:967)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getTotalInputFileSize(JobControlCompiler.java:798)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.estimateNumberOfReducers(JobControlCompiler.java:773)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:611)
        ... 17 more
===============================================================================
version of pig is 0.9.2:
hadoop@ip-10-147-131-60:/mnt/run$ pig -version
Apache Pig version 0.9.2-amzn (rexported)
the weird thing is that if I take out the GROUP BY, it works fine;  if I
take out the glob in the initial LOAD statement, and just load one dir, it
works fine; also if I load both dirs with the glob, then store the  loaded
result after the loadrbl() UDF, then store the result in a intermediate
dir; then load the intermediate result  and continue all the original
computation all the way to GROUP BY, it works fine too.
so why does the GROUP BY  have a problem with the glob above? while they
are far apart and the intermediate steps all worked fine?
thanks
Yang
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB