Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> How can I use load function to load bag field?


+
yonghu 2012-06-11, 16:07
+
Russell Jurney 2012-06-11, 17:33
+
yonghu 2012-06-11, 18:51
+
Russell Jurney 2012-06-11, 18:56
+
yonghu 2012-06-11, 19:07
+
Jonathan Coveney 2012-06-11, 20:15
+
Russell Jurney 2012-06-11, 20:32
Copy link to this message
-
Re: How can I use load function to load bag field?
Thanks for your guys. I tried the code and found out what was the right
pattern of the bag which could be loaded.

regards!

Yong

On Mon, Jun 11, 2012 at 10:32 PM, Russell Jurney
<[EMAIL PROTECTED]>wrote:

> my_data = LOAD 'location' AS (name:chararray, val1:int, val2:int);
> by_name = foreach (group my_data by name) generate group as name,
> my_data.(val1, val2) as my_data;
> store by_name into 'new_location';
>
> grouped_data = LOAD 'new_location') AS (name:chararray,
> my_bag:bag{T2:tuple(val1:int, val2:int)});
> -- Wallah!
>
> On Mon, Jun 11, 2012 at 1:15 PM, Jonathan Coveney <[EMAIL PROTECTED]
> >wrote:
>
> > Yong,
> >
> > If your data is not in the form of a bag, then there is no reason to load
> > it in as a bag. You should load it in as chararray, int, int, and then
> you
> > can transform it into the form you want via the script itself.
> >
> > 2012/6/11 yonghu <[EMAIL PROTECTED]>
> >
> > > Dear Russell,
> > >
> > > My pig version is 0.91. I have tried a little bit. But I got a problem.
> > My
> > > data is looks like:
> > >
> > > henrietta    1    25
> > > sally    1    82
> > > fred    2    120
> > > elsie    3    45
> > > tom    1    82
> > > tom    4    98
> > > sally    2    87
> > >
> > > the delimiter is '\t'.
> > >
> > > I use the command to load the data
> > >
> > > A = LOAD '/home/yonghu/test/student.txt' AS
> > > >> (name:chararray,B:{T1:(id:int,result:int)});
> > >
> > > then I got the following errors:
> > >
> > > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 2, column
> 42>
> > > mismatched input ';' expecting RIGHT_PAREN
> > > Details at logfile: /home/yonghu/pig-0.9.1/bin/pig_1339440832010.log
> > >
> > > what does here right_paren mean? Is there any request of the input
> data?
> > >
> > > Thanks.
> > >
> > > Yong
> > >
> > > On Mon, Jun 11, 2012 at 8:56 PM, Russell Jurney <
> > [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > High five! o/\o
> > > >
> > > > On Mon, Jun 11, 2012 at 11:51 AM, yonghu <[EMAIL PROTECTED]>
> > wrote:
> > > >
> > > > > Dear Russell,
> > > > >
> > > > > Thanks for your response.
> > > > >
> > > > > Yong
> > > > >
> > > > > On Mon, Jun 11, 2012 at 7:33 PM, Russell Jurney <
> > > > [EMAIL PROTECTED]
> > > > > >wrote:
> > > > >
> > > > > > Doesn't need a UDF (if it's PigStorage or something else
> > supported),
> > > > > just a
> > > > > > cast.
> > > > > >
> > > > > > foo = LOAD 'location' as B:bag{T2:tuple(t1:float,t2:float)};
> > > > > >
> > > > > > Pulled from the docs:
> > > > > http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html
> > > > > >
> > > > > > A = LOAD 'mydata' AS (T1:tuple(f1:int, f2:int),
> > > > > > B:bag{T2:tuple(t1:float,t2:float)}, M:map[] );
> > > > > >
> > > > > > A = LOAD 'mydata' AS (T1:(f1:int, f2:int),
> > > B:{T2:(t1:float,t2:float)},
> > > > > > M:[] );
> > > > > >
> > > > > >
> > > > > > Russell Jurney
> > > > > > twitter.com/rjurney
> > > > > > [EMAIL PROTECTED]
> > > > > > datasyndrome.com
> > > > > >
> > > > > > On Jun 11, 2012, at 9:07 AM, yonghu <[EMAIL PROTECTED]>
> wrote:
> > > > > >
> > > > > > Dear All,
> > > > > >
> > > > > > How can I define UDF load function to load the bag field? Such as
> > A > > > > > LOAD
> > > > > > 'location' as (filed_name : bag {}). Can anyone show me an
> example
> > > > code?
> > > > > >
> > > > > > Regards!
> > > > > >
> > > > > > Yong
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Russell Jurney twitter.com/rjurney [EMAIL PROTECTED]
> > > > datasyndrome.com
> > > >
> > >
> >
>
>
>
> --
> Russell Jurney twitter.com/rjurney [EMAIL PROTECTED]
> datasyndrome.com
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB