Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> How to handle optional fields in schema


Copy link to this message
-
Re: How to handle optional fields in schema
I bet you are doing illustrate in your pig script. That may have a problem.
Just either do dump or store and your script should work fine.

Ashutosh
On Sat, Nov 12, 2011 at 17:03, B M D Gill <[EMAIL PROTECTED]> wrote:

> Thanks Dimitry, will mention it to Amazon for sure.
>
> That was the first thing I tried and it didn't seem to make it work.  Not
> sure what I could be doing wrong.  I get an Index out of bound error where
> the index corresponds to the first instance of the optional field.  Here is
> the stack trace:
>
> Pig Stack Trace
> ---------------
> ERROR 2999: Unexpected internal error. Index: 29, Size: 29
>
> java.lang.IndexOutOfBoundsException: Index: 29, Size: 29
>  at java.util.ArrayList.RangeCheck(ArrayList.java:547)
> at java.util.ArrayList.get(ArrayList.java:322)
>  at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143)
> at org.apache.pig.pen.util.ExampleTuple.get(ExampleTuple.java:80)
>  at
>
> org.apache.pig.pen.AugmentBaseDataVisitor.visit(AugmentBaseDataVisitor.java:427)
> at org.apache.pig.impl.logicalLayer.LOLoad.visit(LOLoad.java:210)
>  at org.apache.pig.impl.logicalLayer.LOLoad.visit(LOLoad.java:52)
> at
>
> org.apache.pig.pen.util.PreOrderDepthFirstWalker.depthFirst(PreOrderDepthFirstWalker.java:70)
>  at
>
> org.apache.pig.pen.util.PreOrderDepthFirstWalker.depthFirst(PreOrderDepthFirstWalker.java:72)
> at
>
> org.apache.pig.pen.util.PreOrderDepthFirstWalker.walk(PreOrderDepthFirstWalker.java:55)
>  at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at
> org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:121)
>  at org.apache.pig.PigServer.getExamples(PigServer.java:731)
> at
>
> org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:557)
>  at
>
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:246)
> at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>  at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
>  at org.apache.pig.Main.main(Main.java:374)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> ===============================================================================>
>
>
> On Sun, Nov 13, 2011 at 12:30 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]>
> wrote:
>
> > If you change the load statement to "load '$input' as (f1, f2, f3, f4,
> > f5), f4 and f5 will be treated as null if they are absent in the raw
> > logs.
> >
> > If you start relying on Pig heavily, lobby Amazon to upgrade their
> > version of Pig (or at least provide both 0.6 and 0.9.1). At this
> > point, 0.6 is positively ancient. But the extra field behavior worked
> > that way then, too.
> >
> > D
> >
> > On Sat, Nov 12, 2011 at 4:08 PM, B M D Gill <[EMAIL PROTECTED]> wrote:
> > > I'm a newbie running Pig 0.6 on Amazon Elastic Map Reduce.  I need to
> > make
> > > a change to add additional fields to the log files that I run my pig
> jobs
> > > on  and am wondering how do I handle this schema in pig.
> > >
> > > My current inputs are tab separated fields that I input using the
> > standard
> > > pig storage function:
> > >
> > > LOAD '$INPUT' USING PigStorage('\t') as (f1, f2, f3);
> > >
> > > However some input files will now have additional fields f4, f5, f6
> etc.
> > at
> > > the trailing edge of each line.  How do I set up the load function to
> > > handle these optional fields?  Do I need to make changes to my logic to
> > > deal with these fields possibly being empty or will Pig simply record
> > their
> > > value as null if they are absent?
> > >
> > > Thanks to anyone who can share some insight.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB