Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - null pointer error with a simple pig program


Copy link to this message
-
null pointer error with a simple pig program
Yang 2013-03-11, 07:11
the following code gave null pointer exception

---------------------------------------------------------------------------------------

rbl_raw = load 's3://mybucket/rbl-logs/{2013/03/06,2013/03/05}' AS
(line:chararray);

rbl = FOREACH rbl_raw GENERATE FLATTEN(loadrbl(line)) AS (x:chararray,
y:chararray);

seo_rbl = FILTER rbl BY x IS NOT NULL AND y == 'seo_google';

rbl1 = GROUP seo_rbl BY x;

STORE rbl1 INTO '/user/hadoop/blah'

-------------------------------------------------------------------------------
Pig Stack Trace
---------------
ERROR 2017: Internal error creating job configuration.

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
ERROR 2017: Internal error creating job configuration.
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:750)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:267)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:151)
        at org.apache.pig.PigServer.launchPlan(PigServer.java:1313)
        at
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1298)
        at org.apache.pig.PigServer.execute(PigServer.java:1288)
        at org.apache.pig.PigServer.executeBatch(PigServer.java:360)
        at
org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132)
        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
        at org.apache.pig.Main.run(Main.java:568)
        at org.apache.pig.Main.main(Main.java:114)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
Caused by: java.lang.NullPointerException
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:994)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:967)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getTotalInputFileSize(JobControlCompiler.java:798)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.estimateNumberOfReducers(JobControlCompiler.java:773)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:611)
        ... 17 more
===============================================================================
version of pig is 0.9.2:
hadoop@ip-10-147-131-60:/mnt/run$ pig -version
Apache Pig version 0.9.2-amzn (rexported)
the weird thing is that if I take out the GROUP BY, it works fine;  if I
take out the glob in the initial LOAD statement, and just load one dir, it
works fine; also if I load both dirs with the glob, then store the  loaded
result after the loadrbl() UDF, then store the result in a intermediate
dir; then load the intermediate result  and continue all the original
computation all the way to GROUP BY, it works fine too.
so why does the GROUP BY  have a problem with the glob above? while they
are far apart and the intermediate steps all worked fine?
thanks
Yang