Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> FLATTEN(bag_of_tuples) error in 0.8.1 ?


Copy link to this message
-
Re: FLATTEN(bag_of_tuples) error in 0.8.1 ?
we use cdh3u3,

unfortunately due to company ops experience, we'd have to stick to cdh3u3
and pig 0.8.1

On Wed, Jul 18, 2012 at 5:39 PM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:

> pig 0.8.1 isn't really seeing any active development at all. Is there a
> reason why you can't use 0.10.0?
>
> 2012/7/18 Yang <[EMAIL PROTECTED]>
>
> > this actually caused a rather nasty bug today.
> >
> >
> > in another udf that returns a bag of tuples, originally I inserted the
> > tuple into a fieldschema inside the bag,
> > and the schema for FLATTEN(myudf()) as
> >
> > mytuple::field1, mytuple::field2,
> >
> >
> > but actually the values of all the fields are expanded into the root
> level,
> > and overwrote another field having the same value, but without the
> > "mytuple::" part
> >
> > this is on 0.8.1
> >
> >
> >
> >
> > On Tue, Jul 17, 2012 at 11:25 PM, Jonathan Coveney <[EMAIL PROTECTED]
> > >wrote:
> >
> > > In 0.10 you should have to have bag -> tuple -> elments
> > >
> > > 2012/7/17 Yang <[EMAIL PROTECTED]>
> > >
> > > > ok, found the issue,
> > > >
> > > > now I do not create an explicit FieldSchema for the inside tuple
> > Schema,
> > > > but directly insert the tuple schema into
> > > > the bag. then it works.
> > > >
> > > > this is indeed some difference between 081 and 0.10, cuz the original
> > > works
> > > > on 0.10, and the new one only works on 0.8.1
> > > >
> > > > On Tue, Jul 17, 2012 at 4:59 PM, Yang <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > I created a Udf that returns a Bag of Tuples.  the syntax is all
> > fine,
> > > > but
> > > > > when I run it in pig,
> > > > > Pig gives error:
> > > > > 2/07/17 16:51:58 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics
> > > with
> > > > > processName=JobTracker, sessionId= - already initialized
> > > > > 12/07/17 16:51:58 WARN mapred.LocalJobRunner: job_local_0001
> > > > > java.lang.ClassCastException: java.lang.String cannot be cast to
> > > > > org.apache.pig.data.Tuple
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:392)
> > > > >  at
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:342)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:290)
> > > > >  at
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:237)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
> > > > >  at
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > > >  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > > >  at
> > > > >
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > > > 12/07/17 16:51:58 INFO mapReduceLayer.MapReduceLauncher:
> HadoopJobId:
> > > > > job_local_0001
> > > > >
> > > > >
> > > > >
> > > > > it looks that the returned value is wrong somehow. but I checked
> the
> > > > > outputSchema() method, and it is exactly the same as
> > > > > online docs. where am I wrong?
> > > > > ---- this is pig 0.8.1 .       I posted a question about 1 month
> ago,
> > > > > stating that 0.8.1 FLATTEN(bag_of_tuples) behavior is different
> from
> > > > > 0.10.0, in that
> > > > > it keeps the enclosing tuple, while 0.10.0 strips it and places the
> > > > fields
> > > > > at the root level.
> > > > >
> > > > >
> > > > >
> > > > > Thanks!
> > > > > yang
> > > > >
> > > > > ///// DemoUdf.java
> > > > >
> > > > > import java.io.IOException;
> >
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB