Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - conditional and multiple generate inside foreach?


Copy link to this message
-
Re: conditional and multiple generate inside foreach?
Dexin Wang 2011-07-25, 16:49
wow, awesome, works great! Thanks Shawn!

On Mon, Jul 25, 2011 at 9:27 AM, Xiaomeng Wan <[EMAIL PROTECTED]> wrote:

> no, you want a bag. should be this:
>
> B = foreach A generate name,days_ago, FLATTEN(((days_ago => > 1)?{('yesterday'),('week'),('month'),('quarter')}:((...)?:));
>
> On Mon, Jul 25, 2011 at 10:25 AM, Xiaomeng Wan <[EMAIL PROTECTED]> wrote:
> > maybe you can try something like this:
> >
> > B = foreach A generate name,days_ago, FLATTEN(((days_ago => > 1)?{('yesterday','week','month','quarter')}:((...)?:));
> >
> > Shawn
> >
> > On Sat, Jul 23, 2011 at 7:44 PM, Raghu Angadi <[EMAIL PROTECTED]>
> wrote:
> >> I see 3 independent questions :
> >>
> >>  1. How can we pass entire row tuple to an UDF as 'B = FOREACH A
> GENERATE
> >> myudf(A)', without knowing schema? I don't know if that is passible. It
> does
> >> feel like it should be possible.
> >>
> >>  2. How can I return an augmented Tuple? Your UDF can make a copy of the
> >> input tuple and add whatever you like to and return it.. may be your
> >> question is not this simple.
> >>
> >>  3. How can I make UDF result in multiple row for for input row  as in
> your
> >> example:
> >>       - your UDF needs to return bag of row tuples. For (b,1) it would
> >> return {(b,1,yesterday), (b,1,week), ... }
> >>       - your pig script would flatten the output of the UDF :
> >>         B = foreach A generate FLATTEN( myUDF(name, days_ago) );
> >>
> >> Raghu.
> >>
> >> On Fri, Jul 22, 2011 at 6:10 PM, Dexin Wang <[EMAIL PROTECTED]>
> wrote:
> >>
> >>> Thanks. I'm not familiar with python, but I write bunch of UDFs in
> java.
> >>>
> >>> One question though, how do I pass the the entire tuple to the UDF, I
> mean
> >>> I
> >>> can't do something like this:
> >>>
> >>>    B = FOREACH A GENERATE myudf(A)
> >>>
> >>> Essentially what I want is given a tuple, I want to enrich the tuple to
> add
> >>> one more field to it, and the value of the new field depends on the
> value
> >>> in
> >>> some existing fields in the tuple.
> >>>
> >>> (a,1) -> (a,1,yesterday)
> >>>
> >>> how would I do that?
> >>>
> >>> I imagine I can do
> >>>   B = GROUP A BY random;
> >>>   C = FOREACH B GENERATE myudf(A);
> >>>
> >>> But I really don't like adding another GROUP BY here.
> >>>
> >>> On Fri, Jul 22, 2011 at 5:23 PM, Scott Foster <[EMAIL PROTECTED]
> >>> >wrote:
> >>>
> >>> > Hi Dexin,
> >>> > This is the sort of thing I've started using Python UDFs for. See:
> >>> > http://wiki.apache.org/pig/UDFsUsingScriptingLanguages for examples
> of
> >>> > how to write the python code.
> >>> >
> >>> > If your udf was implemented in Python you could then do this...
> >>> >
> >>> > register 'udfs.py' using jython as udf;
> >>> > ...
> >>> > B = FOREACH A generate name, udf.daysAgoString(days_ago);
> >>> >
> >>> > scott.
> >>> >
> >>> > On Fri, Jul 22, 2011 at 4:42 PM, Dexin Wang <[EMAIL PROTECTED]>
> wrote:
> >>> > > Possible to do conditional and more than one generate inside a
> foreach?
> >>> > >
> >>> > > for example, I have tuples like this (names, days_ago)
> >>> > >
> >>> > > (a,0)
> >>> > > (b,1)
> >>> > > (c,9)
> >>> > > (d,40)
> >>> > >
> >>> > > b shows up 1 day ago, so it belongs to all of the following:
> yesterday,
> >>> > last
> >>> > > week, last month, and last quarter. So I'd like to turn the above
> to:
> >>> > >
> >>> > > (a,0,today)
> >>> > > (b,1,yesterday)
> >>> > > (b,1,week)
> >>> > > (b,1,month)
> >>> > > (b,1,quarter)
> >>> > > (c,9,month)
> >>> > > (c,9,quarter)
> >>> > > (d,40,quarter)
> >>> > >
> >>> > > I imagine/dream I could do something like this
> >>> > >
> >>> > > B = FOREACH A
> >>> > >  {
> >>> > >        if (days_ago <= 90) generate name,days_ago,'quarter';
> >>> > >        if (days_ago <= 30) generate name,days_ago,'month';
> >>> > >        if (days_ago <= 7)   generate name,days_ago,'week';
> >>> > >        if (days_ago == 1)   generate name,days_ago,'yesterday';
> >>> > >        if (days_ago == 0)   generate name,days_ago,'today';
> >>> > >  }
> >>> > >