Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> FLATTEN(bag_of_tuples) error in 0.8.1 ?


Copy link to this message
-
Re: FLATTEN(bag_of_tuples) error in 0.8.1 ?
pig 0.8.1 isn't really seeing any active development at all. Is there a
reason why you can't use 0.10.0?

2012/7/18 Yang <[EMAIL PROTECTED]>

> this actually caused a rather nasty bug today.
>
>
> in another udf that returns a bag of tuples, originally I inserted the
> tuple into a fieldschema inside the bag,
> and the schema for FLATTEN(myudf()) as
>
> mytuple::field1, mytuple::field2,
>
>
> but actually the values of all the fields are expanded into the root level,
> and overwrote another field having the same value, but without the
> "mytuple::" part
>
> this is on 0.8.1
>
>
>
>
> On Tue, Jul 17, 2012 at 11:25 PM, Jonathan Coveney <[EMAIL PROTECTED]
> >wrote:
>
> > In 0.10 you should have to have bag -> tuple -> elments
> >
> > 2012/7/17 Yang <[EMAIL PROTECTED]>
> >
> > > ok, found the issue,
> > >
> > > now I do not create an explicit FieldSchema for the inside tuple
> Schema,
> > > but directly insert the tuple schema into
> > > the bag. then it works.
> > >
> > > this is indeed some difference between 081 and 0.10, cuz the original
> > works
> > > on 0.10, and the new one only works on 0.8.1
> > >
> > > On Tue, Jul 17, 2012 at 4:59 PM, Yang <[EMAIL PROTECTED]> wrote:
> > >
> > > > I created a Udf that returns a Bag of Tuples.  the syntax is all
> fine,
> > > but
> > > > when I run it in pig,
> > > > Pig gives error:
> > > > 2/07/17 16:51:58 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics
> > with
> > > > processName=JobTracker, sessionId= - already initialized
> > > > 12/07/17 16:51:58 WARN mapred.LocalJobRunner: job_local_0001
> > > > java.lang.ClassCastException: java.lang.String cannot be cast to
> > > > org.apache.pig.data.Tuple
> > > > at
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:392)
> > > >  at
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:342)
> > > > at
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:290)
> > > >  at
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:237)
> > > > at
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
> > > >  at
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > >  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > > >  at
> > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > > > 12/07/17 16:51:58 INFO mapReduceLayer.MapReduceLauncher: HadoopJobId:
> > > > job_local_0001
> > > >
> > > >
> > > >
> > > > it looks that the returned value is wrong somehow. but I checked the
> > > > outputSchema() method, and it is exactly the same as
> > > > online docs. where am I wrong?
> > > > ---- this is pig 0.8.1 .       I posted a question about 1 month ago,
> > > > stating that 0.8.1 FLATTEN(bag_of_tuples) behavior is different from
> > > > 0.10.0, in that
> > > > it keeps the enclosing tuple, while 0.10.0 strips it and places the
> > > fields
> > > > at the root level.
> > > >
> > > >
> > > >
> > > > Thanks!
> > > > yang
> > > >
> > > > ///// DemoUdf.java
> > > >
> > > > import java.io.IOException;
> > > >
> > > > import org.apache.pig.EvalFunc;
> > > > import org.apache.pig.data.DataBag;
> > > > import org.apache.pig.data.DataType;
> > > > import org.apache.pig.data.DefaultDataBag;
> > > > import org.apache.pig.data.DefaultTuple;
> > > > import org.apache.pig.data.Tuple;
> > > > import org.apache.pig.impl.logicalLayer.FrontendException;
> > > > import org.apache.pig.impl.logicalLayer.schema.Schema;
> > > >
> > > > public class DemoUdf  extends EvalFunc<DataBag> {
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB