Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Re: Using matches in generate clause?


Copy link to this message
-
Re: Using matches in generate clause?
With Pig 0.9 you can do this, though:

FOREACH html_pages GENERATE portal_id, (html matches 'some pattern' ? 1 :
0) as
wp_match:int;

On Thu, Sep 27, 2012 at 10:38 AM, Alan Gates <[EMAIL PROTECTED]> wrote:

> In Pig 0.9 boolean was not yet a first class data type, so boolean types
> were not allowed in foreach statements.  In Pig 0.10 boolean became a first
> class type, so expressions that return booleans (such as matches) should
> work.
>
> Alan.
>
>
> On Sep 27, 2012, at 10:34 AM, pablomar wrote:
>
> > no idea why, but matches works with FILTER but it doesn't with FOREACH
> > I've tried with pig 0.9.2
> >
> > example (this works):
> > b = filter html_pages by html matches 'some pattern';
> >
> >
> > if you still want to do it with foreach, you can write your UDF,
> something
> > like:
> >
> > public class MyMatch extends EvalFunc <Boolean>
> > {
> >  public Boolean exec(Tuple input) throws IOException
> >  {
> >    try
> >    {
> >      String pattern = (String)input.get(0);
> >      String value = (String)input.get(1);
> >
> >      return value.matches(pattern);
> >    }
> >    catch(Exception e)
> >    {
> >      throw WrappedIOException.wrap("ouch!", e);
> >    }
> >  }
> > }
> >
> >
> > and use it just like this:
> >
> > b = foreach html_pages generate portal_id, MyMatch('some pattern', html)
> as
> > wp_match;
> >
> >
> >
> >
> > On Thu, Sep 27, 2012 at 12:38 PM, Alan Gates <[EMAIL PROTECTED]>
> wrote:
> >
> >> What version of Pig are you using?
> >>
> >> Alan.
> >>
> >> On Sep 27, 2012, at 8:54 AM, James Kebinger wrote:
> >>
> >>> Hello, I'm having some trouble doing something I thought would be easy:
> >> I'd
> >>> like to use matches to generate a boolean flag but this seems to not
> >>> compile:
> >>>
> >>> FOREACH html_pages GENERATE portal_id, html matches 'some pattern' as
> >>> wp_match:boolean;
> >>>
> >>> I've tried wrapping it in parens too, with no luck.
> >>>
> >>> Is this possible, or am I out of luck?
> >>>
> >>> thanks
> >>
> >>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB