Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> How to handle optional fields in schema


Copy link to this message
-
Re: How to handle optional fields in schema
I bet you are doing illustrate in your pig script. That may have a problem.
Just either do dump or store and your script should work fine.

Ashutosh
On Sat, Nov 12, 2011 at 17:03, B M D Gill <[EMAIL PROTECTED]> wrote:

> Thanks Dimitry, will mention it to Amazon for sure.
>
> That was the first thing I tried and it didn't seem to make it work.  Not
> sure what I could be doing wrong.  I get an Index out of bound error where
> the index corresponds to the first instance of the optional field.  Here is
> the stack trace:
>
> Pig Stack Trace
> ---------------
> ERROR 2999: Unexpected internal error. Index: 29, Size: 29
>
> java.lang.IndexOutOfBoundsException: Index: 29, Size: 29
>  at java.util.ArrayList.RangeCheck(ArrayList.java:547)
> at java.util.ArrayList.get(ArrayList.java:322)
>  at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143)
> at org.apache.pig.pen.util.ExampleTuple.get(ExampleTuple.java:80)
>  at
>
> org.apache.pig.pen.AugmentBaseDataVisitor.visit(AugmentBaseDataVisitor.java:427)
> at org.apache.pig.impl.logicalLayer.LOLoad.visit(LOLoad.java:210)
>  at org.apache.pig.impl.logicalLayer.LOLoad.visit(LOLoad.java:52)
> at
>
> org.apache.pig.pen.util.PreOrderDepthFirstWalker.depthFirst(PreOrderDepthFirstWalker.java:70)
>  at
>
> org.apache.pig.pen.util.PreOrderDepthFirstWalker.depthFirst(PreOrderDepthFirstWalker.java:72)
> at
>
> org.apache.pig.pen.util.PreOrderDepthFirstWalker.walk(PreOrderDepthFirstWalker.java:55)
>  at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at
> org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:121)
>  at org.apache.pig.PigServer.getExamples(PigServer.java:731)
> at
>
> org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:557)
>  at
>
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:246)
> at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>  at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
>  at org.apache.pig.Main.main(Main.java:374)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> ===============================================================================>
>
>
> On Sun, Nov 13, 2011 at 12:30 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]>
> wrote:
>
> > If you change the load statement to "load '$input' as (f1, f2, f3, f4,
> > f5), f4 and f5 will be treated as null if they are absent in the raw
> > logs.
> >
> > If you start relying on Pig heavily, lobby Amazon to upgrade their
> > version of Pig (or at least provide both 0.6 and 0.9.1). At this
> > point, 0.6 is positively ancient. But the extra field behavior worked
> > that way then, too.
> >
> > D
> >
> > On Sat, Nov 12, 2011 at 4:08 PM, B M D Gill <[EMAIL PROTECTED]> wrote:
> > > I'm a newbie running Pig 0.6 on Amazon Elastic Map Reduce.  I need to
> > make
> > > a change to add additional fields to the log files that I run my pig
> jobs
> > > on  and am wondering how do I handle this schema in pig.
> > >
> > > My current inputs are tab separated fields that I input using the
> > standard
> > > pig storage function:
> > >
> > > LOAD '$INPUT' USING PigStorage('\t') as (f1, f2, f3);
> > >
> > > However some input files will now have additional fields f4, f5, f6
> etc.
> > at
> > > the trailing edge of each line.  How do I set up the load function to
> > > handle these optional fields?  Do I need to make changes to my logic to
> > > deal with these fields possibly being empty or will Pig simply record
> > their
> > > value as null if they are absent?
> > >
> > > Thanks to anyone who can share some insight.