Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> How can I use load function to load bag field?


Copy link to this message
-
Re: How can I use load function to load bag field?
my_data = LOAD 'location' AS (name:chararray, val1:int, val2:int);
by_name = foreach (group my_data by name) generate group as name,
my_data.(val1, val2) as my_data;
store by_name into 'new_location';

grouped_data = LOAD 'new_location') AS (name:chararray,
my_bag:bag{T2:tuple(val1:int, val2:int)});
-- Wallah!

On Mon, Jun 11, 2012 at 1:15 PM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:

> Yong,
>
> If your data is not in the form of a bag, then there is no reason to load
> it in as a bag. You should load it in as chararray, int, int, and then you
> can transform it into the form you want via the script itself.
>
> 2012/6/11 yonghu <[EMAIL PROTECTED]>
>
> > Dear Russell,
> >
> > My pig version is 0.91. I have tried a little bit. But I got a problem.
> My
> > data is looks like:
> >
> > henrietta    1    25
> > sally    1    82
> > fred    2    120
> > elsie    3    45
> > tom    1    82
> > tom    4    98
> > sally    2    87
> >
> > the delimiter is '\t'.
> >
> > I use the command to load the data
> >
> > A = LOAD '/home/yonghu/test/student.txt' AS
> > >> (name:chararray,B:{T1:(id:int,result:int)});
> >
> > then I got the following errors:
> >
> > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 2, column 42>
> > mismatched input ';' expecting RIGHT_PAREN
> > Details at logfile: /home/yonghu/pig-0.9.1/bin/pig_1339440832010.log
> >
> > what does here right_paren mean? Is there any request of the input data?
> >
> > Thanks.
> >
> > Yong
> >
> > On Mon, Jun 11, 2012 at 8:56 PM, Russell Jurney <
> [EMAIL PROTECTED]
> > >wrote:
> >
> > > High five! o/\o
> > >
> > > On Mon, Jun 11, 2012 at 11:51 AM, yonghu <[EMAIL PROTECTED]>
> wrote:
> > >
> > > > Dear Russell,
> > > >
> > > > Thanks for your response.
> > > >
> > > > Yong
> > > >
> > > > On Mon, Jun 11, 2012 at 7:33 PM, Russell Jurney <
> > > [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > Doesn't need a UDF (if it's PigStorage or something else
> supported),
> > > > just a
> > > > > cast.
> > > > >
> > > > > foo = LOAD 'location' as B:bag{T2:tuple(t1:float,t2:float)};
> > > > >
> > > > > Pulled from the docs:
> > > > http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html
> > > > >
> > > > > A = LOAD 'mydata' AS (T1:tuple(f1:int, f2:int),
> > > > > B:bag{T2:tuple(t1:float,t2:float)}, M:map[] );
> > > > >
> > > > > A = LOAD 'mydata' AS (T1:(f1:int, f2:int),
> > B:{T2:(t1:float,t2:float)},
> > > > > M:[] );
> > > > >
> > > > >
> > > > > Russell Jurney
> > > > > twitter.com/rjurney
> > > > > [EMAIL PROTECTED]
> > > > > datasyndrome.com
> > > > >
> > > > > On Jun 11, 2012, at 9:07 AM, yonghu <[EMAIL PROTECTED]> wrote:
> > > > >
> > > > > Dear All,
> > > > >
> > > > > How can I define UDF load function to load the bag field? Such as
> A > > > > LOAD
> > > > > 'location' as (filed_name : bag {}). Can anyone show me an example
> > > code?
> > > > >
> > > > > Regards!
> > > > >
> > > > > Yong
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Russell Jurney twitter.com/rjurney [EMAIL PROTECTED]
> > > datasyndrome.com
> > >
> >
>

--
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com