Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> dereference bag of tuples of fields


+
Rodriguez, John 2010-07-30, 22:10
+
Thejas M Nair 2010-07-30, 22:38
+
Rodriguez, John 2010-07-31, 16:35
+
Scott Carey 2010-07-31, 16:39
+
Rodriguez, John 2010-08-01, 14:48
Copy link to this message
-
Re: dereference bag of tuples of fields
If you are loading data through PigStorage (which will be used if you
dont specify any) then there should be a comma separating tuples in
the bag, so your data should look like

cat data
{(1,1,1)}
{(2,2,2),(3,3,3)}
{(4,4,4),(5,5,5),(6,6,6)}

then
grunt> A = LOAD 'data' AS (B: bag {T: tuple(t1:int, t2:int, t3:int)});
grunt> C = foreach A generate B.t1, B.t2, B.t3;
grunt> dump C;

{(1)},{(1)},{(1)})
({(2),(3)},{(2),(3)},{(2),(3)})
({(4),(5),(6)},{(4),(5),(6)},{(4),(5),(6)})
Ashutosh
On Sun, Aug 1, 2010 at 07:48, Rodriguez, John <[EMAIL PROTECTED]> wrote:
> Does this mean there is no way to access the fields t1, t2, t3?
>
>
>
> cat data
>
> {(1,1,1)}
>
> {(2,2,2)(3,3,3)}
>
> {(4,4,4)(5,5,5)(6,6,6)}
>
> A = LOAD 'data' AS (B: bag {T: tuple(t1:int, t2:int, t3:int)});
>
>
>
>
>
> From: Scott Carey [mailto:[EMAIL PROTECTED]]
> Sent: Saturday, July 31, 2010 9:39 AM
> To: [EMAIL PROTECTED]; Rodriguez, John
> Subject: Re: dereference bag of tuples of fields
>
>
>
> data.isValid
>
> All bags are bags of tuples.  The tuple is intrinsic and invisible at
> the syntax level - its visible to udfs though.  If you nest one more
> tuple in that nested tuple pig gets confused.    So 'bag.field' is
> actually a double dereference - one for the bag and one for the
> intrinsic tuple.
>
> ----- Reply message -----
> From: "Rodriguez, John" <[EMAIL PROTECTED]>
> Date: Fri, Jul 30, 2010 3:11 pm
> Subject: dereference bag of tuples of fields
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>
> I have built a bag tuples where the tuples contain fields.
>
>
>
> I am reading SequenceFiles and have reading MyLoader to do this. I
> created a subset of all the fields, "isValid" to make the example
> simpler.
>
>
>
> I am not sure how to apply a dereference operator to this?
>
>
>
> A = LOAD '/data/NetFlowDigests/rk/DigestMessage/part-r-00000' using
> MyLoader() AS (data: bag{t: tuple(isValid:int)});
>
> DESCRIBE A;
>
> A: {data: {t: (isValid: int)}}
>
>
>
> So all the ways that I have tried to dereference have syntax errors.
>
>
>
> B = GROUP A BY (data.t);
>
> 2010-07-30 21:51:29,881 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only
> access to the elements of the tuple in the bag is allowed.
>
>
>
> B = GROUP A BY (data.t.isValid);
>
> 2010-07-30 21:54:11,157 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1028: Access to the tuple (t) of the bag is disallowed. Only
> access to the elements of the tuple in the bag is allowed.
>
>
>
> B = GROUP A BY (t.isValid);
>
> 2010-07-30 21:55:31,475 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1000: Error during parsing. Invalid alias: t in {data: {t:
> (isValid: int)}}
>
>
>
> What is the proper way to do this?
>
>
>
> John Rodriguez
>
>
>
>
+
Rodriguez, John 2010-08-02, 17:04
+
Rodriguez, John 2010-08-02, 19:35
+
Xiaomeng Wan 2010-08-03, 17:16
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB