Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - dereference bag of tuples of fields


Copy link to this message
-
Re: dereference bag of tuples of fields
Thejas M Nair 2010-07-30, 22:38
Can you given an example of your data, and what output you want from the pig query ?

That will help me understand what you want the query to do . From the schema and query, that is not very clear to me.

-Thejas

On 7/30/10 3:10 PM, "Rodriguez, John" <[EMAIL PROTECTED]> wrote:

I have built a bag tuples where the tuples contain fields.

I am reading SequenceFiles and have reading MyLoader to do this. I
created a subset of all the fields, "isValid" to make the example
simpler.

I am not sure how to apply a dereference operator to this?

A = LOAD '/data/NetFlowDigests/rk/DigestMessage/part-r-00000' using
MyLoader() AS (data: bag{t: tuple(isValid:int)});

DESCRIBE A;

A: {data: {t: (isValid: int)}}

So all the ways that I have tried to dereference have syntax errors.

B = GROUP A BY (data.t);

2010-07-30 21:51:29,881 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only
access to the elements of the tuple in the bag is allowed.

B = GROUP A BY (data.t.isValid);

2010-07-30 21:54:11,157 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only
access to the elements of the tuple in the bag is allowed.

B = GROUP A BY (t.isValid);

2010-07-30 21:55:31,475 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1000: Error during parsing. Invalid alias: t in {data: {t:
(isValid: int)}}

What is the proper way to do this?

John Rodriguez