Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> dereference bag of tuples of fields


Copy link to this message
-
Re: dereference bag of tuples of fields
Can you given an example of your data, and what output you want from the pig query ?

That will help me understand what you want the query to do . From the schema and query, that is not very clear to me.

-Thejas

On 7/30/10 3:10 PM, "Rodriguez, John" <[EMAIL PROTECTED]> wrote:

I have built a bag tuples where the tuples contain fields.

I am reading SequenceFiles and have reading MyLoader to do this. I
created a subset of all the fields, "isValid" to make the example
simpler.

I am not sure how to apply a dereference operator to this?

A = LOAD '/data/NetFlowDigests/rk/DigestMessage/part-r-00000' using
MyLoader() AS (data: bag{t: tuple(isValid:int)});

DESCRIBE A;

A: {data: {t: (isValid: int)}}

So all the ways that I have tried to dereference have syntax errors.

B = GROUP A BY (data.t);

2010-07-30 21:51:29,881 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only
access to the elements of the tuple in the bag is allowed.

B = GROUP A BY (data.t.isValid);

2010-07-30 21:54:11,157 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only
access to the elements of the tuple in the bag is allowed.

B = GROUP A BY (t.isValid);

2010-07-30 21:55:31,475 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1000: Error during parsing. Invalid alias: t in {data: {t:
(isValid: int)}}

What is the proper way to do this?

John Rodriguez

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB