Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> see exceptions when accessing a bag within a Tuple field of another tuple


+
Lin Guo 2010-10-12, 22:38
Copy link to this message
-
Re: see exceptions when accessing a bag within a Tuple field of another tuple
Hi Lin,
This does not seem to be a known issue. Can you please open a new jira ?
fyi, I get a java.lang.NullPointerException when I tried running query 1
with 0.7 or trunk versions.

Thanks,
Thejas

On 10/12/10 3:38 PM, "Lin Guo" <[EMAIL PROTECTED]> wrote:

> Hi,
>
> Our data contain tuples one of whose fields is a tuple containing a
> bag field and we've seen the following exceptions when we access the
> bag field:
>
> java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot
> be cast to org.apache.pig.data.DataBag
>         at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperator
> s.POProject.processInputBag(POProject.java:479)
>         at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperator
> s.POProject.getNext(POProject.java:197)
>         at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperator
> s.POProject.processInputBag(POProject.java:477)
>         at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperator
> s.POProject.getNext(POProject.java:197)
>         at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperator
> s.POForEach.processPlan(POForEach.java:336)
>         at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperator
> s.POForEach.getNext(POForEach.java:288)
>         at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Redu
> ce.runPipeline(PigMapReduce.java:433)
>         at
>
> We can reproduce the exceptions using the following scripts.
>
> 1. A = LOAD 'test_input' as (a:int, T:(list:{B:(key:int, value:int)},
> world:chararray) );
> describe A;
> /*
> test_input contains:
> 12      ({(2,13),(4,5)}, 'hello')
> 24      ({(8,17),(9,11),(3,4)}, 'world')
>
> and got A's schema as:
> A: {a: int,T: (list: {B: (key: int,value: int)},world: chararray)}
> */
>
> B = FOREACH A GENERATE  T.list, T.world;
> describe B;
> /*
> got:
> B: {list: {B: (key: int,value: int)},world: chararray}
> */
>
> dump B;
>
> 2.
> ......
>
> b = foreach a generate member_id, primary_email, year_born;
> c = group b by member_id;
> d = foreach c generate group as member_id, b;
> e = group d by member_id;
> f = foreach e generate group as member_id, d;
> g = foreach f generate member_id as A, flatten(d);
>
> h = foreach g generate $0 as A, $1 AS B, $2 AS C;
> describe h;
> /* get the following schema:
> h: {A: int,B: int,C: {member_id: int,primary_email: chararray,year_born: int}}
> */
>
> h = foreach h generate $0 as A, Swap($1, $2) AS T;
> describe h;
> /* We use Swap to generate a tuple out of the last two fields and got
> the following schema
> h: {A: int,T: (C: {member_id: int,primary_email: chararray,year_born:
> int},B: int)}
> */
> g = foreach h generate A, T.C;
> describe g;
>
> g = limit g 15;
> dump g;
>
> Is it a known issue?
>
> Best,
> Lin
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB