Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> see exceptions when accessing a bag within a Tuple field of another tuple


+
Lin Guo 2010-10-12, 22:38
Copy link to this message
-
Re: see exceptions when accessing a bag within a Tuple field of another tuple
Hi Lin,
This does not seem to be a known issue. Can you please open a new jira ?
fyi, I get a java.lang.NullPointerException when I tried running query 1
with 0.7 or trunk versions.

Thanks,
Thejas

On 10/12/10 3:38 PM, "Lin Guo" <[EMAIL PROTECTED]> wrote:

> Hi,
>
> Our data contain tuples one of whose fields is a tuple containing a
> bag field and we've seen the following exceptions when we access the
> bag field:
>
> java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot
> be cast to org.apache.pig.data.DataBag
>         at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperator
> s.POProject.processInputBag(POProject.java:479)
>         at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperator
> s.POProject.getNext(POProject.java:197)
>         at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperator
> s.POProject.processInputBag(POProject.java:477)
>         at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperator
> s.POProject.getNext(POProject.java:197)
>         at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperator
> s.POForEach.processPlan(POForEach.java:336)
>         at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperator
> s.POForEach.getNext(POForEach.java:288)
>         at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Redu
> ce.runPipeline(PigMapReduce.java:433)
>         at
>
> We can reproduce the exceptions using the following scripts.
>
> 1. A = LOAD 'test_input' as (a:int, T:(list:{B:(key:int, value:int)},
> world:chararray) );
> describe A;
> /*
> test_input contains:
> 12      ({(2,13),(4,5)}, 'hello')
> 24      ({(8,17),(9,11),(3,4)}, 'world')
>
> and got A's schema as:
> A: {a: int,T: (list: {B: (key: int,value: int)},world: chararray)}
> */
>
> B = FOREACH A GENERATE  T.list, T.world;
> describe B;
> /*
> got:
> B: {list: {B: (key: int,value: int)},world: chararray}
> */
>
> dump B;
>
> 2.
> ......
>
> b = foreach a generate member_id, primary_email, year_born;
> c = group b by member_id;
> d = foreach c generate group as member_id, b;
> e = group d by member_id;
> f = foreach e generate group as member_id, d;
> g = foreach f generate member_id as A, flatten(d);
>
> h = foreach g generate $0 as A, $1 AS B, $2 AS C;
> describe h;
> /* get the following schema:
> h: {A: int,B: int,C: {member_id: int,primary_email: chararray,year_born: int}}
> */
>
> h = foreach h generate $0 as A, Swap($1, $2) AS T;
> describe h;
> /* We use Swap to generate a tuple out of the last two fields and got
> the following schema
> h: {A: int,T: (C: {member_id: int,primary_email: chararray,year_born:
> int},B: int)}
> */
> g = foreach h generate A, T.C;
> describe g;
>
> g = limit g 15;
> dump g;
>
> Is it a known issue?
>
> Best,
> Lin
>