Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> pig 0.8.1 - Iterating contents of a Bag


Copy link to this message
-
Re: pig 0.8.1 - Iterating contents of a Bag
Amit,

It looks like the FLATTEN operator is exactly what you're looking for
(based on both the 'output you'd like to see' and the fact that your UDF
accept's chararry's and not Bags).

I'm not sure I understand you're question about iterating over bags. Do you
want to call your UDF on each tuple in the bag without flattening it first,
so that your bag will be transformed in place and that data is still
grouped?

If that is the case, you can't do it in Pig 0.8... nested FOREACH
statements were introduced in Pig 0.9. That being said, if you want your
data to be transformed in place without having to flatten first and then
regroup, you could either rewrite your UDF to accept bags instead of
chararrays, or write a wrapper UDF that calls your existing UDF.
On Tue, Jul 23, 2013 at 4:25 PM, Amit <[EMAIL PROTECTED]> wrote:

> Thanks for the quick response.
> However I do not want to flatten because I plan to invoke a previously
> written UDF which accepts a chararray to using each value in the Bag.
>
> I am not sure if it at all is possible with 0.8.1 but just thought to seek
> view from experts on this mailing list.
>
>
> Regards,
> Amit
>
>  From: Serega Sheypak <[EMAIL PROTECTED]>
>
> To: [EMAIL PROTECTED]; Amit <[EMAIL PROTECTED]>
> Sent: Tuesday, July 23, 2013 4:23 PM
> Subject: Re: pig 0.8.1 - Iterating contents of a Bag
>
>
>
> Hi, I'm new to pig, will try to help you.
> B = FOREACH A {
>     GENERATE FLATTEN(keywords.keyword) as keyword;
> };
>
>
> OR
> B = FOREACH A {
>     GENERATE FLATTEN(keywords.keyword) as (keyword);
> };
>
>
> You need flatten the bag.
>
>
>
>
> 2013/7/24 Amit <[EMAIL PROTECTED]>
>
> Hello there,
> >I am loading a data in form of
> >
> >A1: {key: chararray,keywords: {keywords_tuple: (keyword: chararray)}}
> >
> >I believe the Sample data would look like the following
> >
> >{1, {('amit'),('yahoo'),('pig')}
> >
> >I am trying to write a foreach where I can loop through the each keyword
> in the bag.
> >
> >I tried writing this but it seems to not dump the output the way I want
> to see
> >
> >
> >B = FOREACH A {
> >    GENERATE keywords.keyword;
> >};
> >
> >I would like to see
> >
> >('amit')
> >('yahoo')
> >('pig')
> >
> >Instead it prints the entire bag at once like the one below.
> >
> >{('amit'),('yahoo'),('pig')}
> >
> >
> >
> >Please note I do not want to flatten the bag as what I want to process
> each keyword in the bag using a UDF later on.
> >
> >Appreciate any of your inputs.
> >
> >Regards,
> >Amit
> >
>