Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> conditional and multiple generate inside foreach?


Copy link to this message
-
Re: conditional and multiple generate inside foreach?
wow, awesome, works great! Thanks Shawn!

On Mon, Jul 25, 2011 at 9:27 AM, Xiaomeng Wan <[EMAIL PROTECTED]> wrote:

> no, you want a bag. should be this:
>
> B = foreach A generate name,days_ago, FLATTEN(((days_ago => > 1)?{('yesterday'),('week'),('month'),('quarter')}:((...)?:));
>
> On Mon, Jul 25, 2011 at 10:25 AM, Xiaomeng Wan <[EMAIL PROTECTED]> wrote:
> > maybe you can try something like this:
> >
> > B = foreach A generate name,days_ago, FLATTEN(((days_ago => > 1)?{('yesterday','week','month','quarter')}:((...)?:));
> >
> > Shawn
> >
> > On Sat, Jul 23, 2011 at 7:44 PM, Raghu Angadi <[EMAIL PROTECTED]>
> wrote:
> >> I see 3 independent questions :
> >>
> >>  1. How can we pass entire row tuple to an UDF as 'B = FOREACH A
> GENERATE
> >> myudf(A)', without knowing schema? I don't know if that is passible. It
> does
> >> feel like it should be possible.
> >>
> >>  2. How can I return an augmented Tuple? Your UDF can make a copy of the
> >> input tuple and add whatever you like to and return it.. may be your
> >> question is not this simple.
> >>
> >>  3. How can I make UDF result in multiple row for for input row  as in
> your
> >> example:
> >>       - your UDF needs to return bag of row tuples. For (b,1) it would
> >> return {(b,1,yesterday), (b,1,week), ... }
> >>       - your pig script would flatten the output of the UDF :
> >>         B = foreach A generate FLATTEN( myUDF(name, days_ago) );
> >>
> >> Raghu.
> >>
> >> On Fri, Jul 22, 2011 at 6:10 PM, Dexin Wang <[EMAIL PROTECTED]>
> wrote:
> >>
> >>> Thanks. I'm not familiar with python, but I write bunch of UDFs in
> java.
> >>>
> >>> One question though, how do I pass the the entire tuple to the UDF, I
> mean
> >>> I
> >>> can't do something like this:
> >>>
> >>>    B = FOREACH A GENERATE myudf(A)
> >>>
> >>> Essentially what I want is given a tuple, I want to enrich the tuple to
> add
> >>> one more field to it, and the value of the new field depends on the
> value
> >>> in
> >>> some existing fields in the tuple.
> >>>
> >>> (a,1) -> (a,1,yesterday)
> >>>
> >>> how would I do that?
> >>>
> >>> I imagine I can do
> >>>   B = GROUP A BY random;
> >>>   C = FOREACH B GENERATE myudf(A);
> >>>
> >>> But I really don't like adding another GROUP BY here.
> >>>
> >>> On Fri, Jul 22, 2011 at 5:23 PM, Scott Foster <[EMAIL PROTECTED]
> >>> >wrote:
> >>>
> >>> > Hi Dexin,
> >>> > This is the sort of thing I've started using Python UDFs for. See:
> >>> > http://wiki.apache.org/pig/UDFsUsingScriptingLanguages for examples
> of
> >>> > how to write the python code.
> >>> >
> >>> > If your udf was implemented in Python you could then do this...
> >>> >
> >>> > register 'udfs.py' using jython as udf;
> >>> > ...
> >>> > B = FOREACH A generate name, udf.daysAgoString(days_ago);
> >>> >
> >>> > scott.
> >>> >
> >>> > On Fri, Jul 22, 2011 at 4:42 PM, Dexin Wang <[EMAIL PROTECTED]>
> wrote:
> >>> > > Possible to do conditional and more than one generate inside a
> foreach?
> >>> > >
> >>> > > for example, I have tuples like this (names, days_ago)
> >>> > >
> >>> > > (a,0)
> >>> > > (b,1)
> >>> > > (c,9)
> >>> > > (d,40)
> >>> > >
> >>> > > b shows up 1 day ago, so it belongs to all of the following:
> yesterday,
> >>> > last
> >>> > > week, last month, and last quarter. So I'd like to turn the above
> to:
> >>> > >
> >>> > > (a,0,today)
> >>> > > (b,1,yesterday)
> >>> > > (b,1,week)
> >>> > > (b,1,month)
> >>> > > (b,1,quarter)
> >>> > > (c,9,month)
> >>> > > (c,9,quarter)
> >>> > > (d,40,quarter)
> >>> > >
> >>> > > I imagine/dream I could do something like this
> >>> > >
> >>> > > B = FOREACH A
> >>> > >  {
> >>> > >        if (days_ago <= 90) generate name,days_ago,'quarter';
> >>> > >        if (days_ago <= 30) generate name,days_ago,'month';
> >>> > >        if (days_ago <= 7)   generate name,days_ago,'week';
> >>> > >        if (days_ago == 1)   generate name,days_ago,'yesterday';
> >>> > >        if (days_ago == 0)   generate name,days_ago,'today';
> >>> > >  }
> >>> > >
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB