Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> How can I use load function to load bag field?


Copy link to this message
-
Re: How can I use load function to load bag field?
Thanks for your guys. I tried the code and found out what was the right
pattern of the bag which could be loaded.

regards!

Yong

On Mon, Jun 11, 2012 at 10:32 PM, Russell Jurney
<[EMAIL PROTECTED]>wrote:

> my_data = LOAD 'location' AS (name:chararray, val1:int, val2:int);
> by_name = foreach (group my_data by name) generate group as name,
> my_data.(val1, val2) as my_data;
> store by_name into 'new_location';
>
> grouped_data = LOAD 'new_location') AS (name:chararray,
> my_bag:bag{T2:tuple(val1:int, val2:int)});
> -- Wallah!
>
> On Mon, Jun 11, 2012 at 1:15 PM, Jonathan Coveney <[EMAIL PROTECTED]
> >wrote:
>
> > Yong,
> >
> > If your data is not in the form of a bag, then there is no reason to load
> > it in as a bag. You should load it in as chararray, int, int, and then
> you
> > can transform it into the form you want via the script itself.
> >
> > 2012/6/11 yonghu <[EMAIL PROTECTED]>
> >
> > > Dear Russell,
> > >
> > > My pig version is 0.91. I have tried a little bit. But I got a problem.
> > My
> > > data is looks like:
> > >
> > > henrietta    1    25
> > > sally    1    82
> > > fred    2    120
> > > elsie    3    45
> > > tom    1    82
> > > tom    4    98
> > > sally    2    87
> > >
> > > the delimiter is '\t'.
> > >
> > > I use the command to load the data
> > >
> > > A = LOAD '/home/yonghu/test/student.txt' AS
> > > >> (name:chararray,B:{T1:(id:int,result:int)});
> > >
> > > then I got the following errors:
> > >
> > > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 2, column
> 42>
> > > mismatched input ';' expecting RIGHT_PAREN
> > > Details at logfile: /home/yonghu/pig-0.9.1/bin/pig_1339440832010.log
> > >
> > > what does here right_paren mean? Is there any request of the input
> data?
> > >
> > > Thanks.
> > >
> > > Yong
> > >
> > > On Mon, Jun 11, 2012 at 8:56 PM, Russell Jurney <
> > [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > High five! o/\o
> > > >
> > > > On Mon, Jun 11, 2012 at 11:51 AM, yonghu <[EMAIL PROTECTED]>
> > wrote:
> > > >
> > > > > Dear Russell,
> > > > >
> > > > > Thanks for your response.
> > > > >
> > > > > Yong
> > > > >
> > > > > On Mon, Jun 11, 2012 at 7:33 PM, Russell Jurney <
> > > > [EMAIL PROTECTED]
> > > > > >wrote:
> > > > >
> > > > > > Doesn't need a UDF (if it's PigStorage or something else
> > supported),
> > > > > just a
> > > > > > cast.
> > > > > >
> > > > > > foo = LOAD 'location' as B:bag{T2:tuple(t1:float,t2:float)};
> > > > > >
> > > > > > Pulled from the docs:
> > > > > http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html
> > > > > >
> > > > > > A = LOAD 'mydata' AS (T1:tuple(f1:int, f2:int),
> > > > > > B:bag{T2:tuple(t1:float,t2:float)}, M:map[] );
> > > > > >
> > > > > > A = LOAD 'mydata' AS (T1:(f1:int, f2:int),
> > > B:{T2:(t1:float,t2:float)},
> > > > > > M:[] );
> > > > > >
> > > > > >
> > > > > > Russell Jurney
> > > > > > twitter.com/rjurney
> > > > > > [EMAIL PROTECTED]
> > > > > > datasyndrome.com
> > > > > >
> > > > > > On Jun 11, 2012, at 9:07 AM, yonghu <[EMAIL PROTECTED]>
> wrote:
> > > > > >
> > > > > > Dear All,
> > > > > >
> > > > > > How can I define UDF load function to load the bag field? Such as
> > A > > > > > LOAD
> > > > > > 'location' as (filed_name : bag {}). Can anyone show me an
> example
> > > > code?
> > > > > >
> > > > > > Regards!
> > > > > >
> > > > > > Yong
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Russell Jurney twitter.com/rjurney [EMAIL PROTECTED]
> > > > datasyndrome.com
> > > >
> > >
> >
>
>
>
> --
> Russell Jurney twitter.com/rjurney [EMAIL PROTECTED]
> datasyndrome.com
>