Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> null pointer error with a simple pig program


+
Yang 2013-03-11, 07:11
Copy link to this message
-
Re: null pointer error with a simple pig program
Sounds like a bug in the S3 implementation of FileSystem? Does this happen
with pig 0.10 or 0.11?

On Mon, Mar 11, 2013 at 12:11 AM, Yang <[EMAIL PROTECTED]> wrote:

> the following code gave null pointer exception
>
>
> ---------------------------------------------------------------------------------------
>
> rbl_raw = load 's3://mybucket/rbl-logs/{2013/03/06,2013/03/05}' AS
> (line:chararray);
>
> rbl = FOREACH rbl_raw GENERATE FLATTEN(loadrbl(line)) AS (x:chararray,
> y:chararray);
>
> seo_rbl = FILTER rbl BY x IS NOT NULL AND y == 'seo_google';
>
> rbl1 = GROUP seo_rbl BY x;
>
> STORE rbl1 INTO '/user/hadoop/blah'
>
>
> -------------------------------------------------------------------------------
>
>
>
>
> Pig Stack Trace
> ---------------
> ERROR 2017: Internal error creating job configuration.
>
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
> ERROR 2017: Internal error creating job configuration.
>         at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:750)
>         at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:267)
>         at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:151)
>         at org.apache.pig.PigServer.launchPlan(PigServer.java:1313)
>         at
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1298)
>         at org.apache.pig.PigServer.execute(PigServer.java:1288)
>         at org.apache.pig.PigServer.executeBatch(PigServer.java:360)
>         at
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132)
>         at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
>         at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
>         at org.apache.pig.Main.run(Main.java:568)
>         at org.apache.pig.Main.main(Main.java:114)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
> Caused by: java.lang.NullPointerException
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:994)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:967)
>         at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getTotalInputFileSize(JobControlCompiler.java:798)
>         at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.estimateNumberOfReducers(JobControlCompiler.java:773)
>         at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:611)
>         ... 17 more
>
> ===============================================================================>
>
>
> version of pig is 0.9.2:
> hadoop@ip-10-147-131-60:/mnt/run$ pig -version
>
>
> Apache Pig version 0.9.2-amzn (rexported)
>
>
>
>
>
>
> the weird thing is that if I take out the GROUP BY, it works fine;  if I
> take out the glob in the initial LOAD statement, and just load one dir, it
> works fine; also if I load both dirs with the glob, then store the  loaded
> result after the loadrbl() UDF, then store the result in a intermediate
> dir; then load the intermediate result  and continue all the original
> computation all the way to GROUP BY, it works fine too.
>
>
> so why does the GROUP BY  have a problem with the glob above? while they
> are far apart and the intermediate steps all worked fine?
>
>
> thanks
> Yang
>
+
Dmitriy Ryaboy 2013-03-12, 21:56