Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Re: Using matches in generate clause?


Copy link to this message
-
Re: Using matches in generate clause?
With Pig 0.9 you can do this, though:

FOREACH html_pages GENERATE portal_id, (html matches 'some pattern' ? 1 :
0) as
wp_match:int;

On Thu, Sep 27, 2012 at 10:38 AM, Alan Gates <[EMAIL PROTECTED]> wrote:

> In Pig 0.9 boolean was not yet a first class data type, so boolean types
> were not allowed in foreach statements.  In Pig 0.10 boolean became a first
> class type, so expressions that return booleans (such as matches) should
> work.
>
> Alan.
>
>
> On Sep 27, 2012, at 10:34 AM, pablomar wrote:
>
> > no idea why, but matches works with FILTER but it doesn't with FOREACH
> > I've tried with pig 0.9.2
> >
> > example (this works):
> > b = filter html_pages by html matches 'some pattern';
> >
> >
> > if you still want to do it with foreach, you can write your UDF,
> something
> > like:
> >
> > public class MyMatch extends EvalFunc <Boolean>
> > {
> >  public Boolean exec(Tuple input) throws IOException
> >  {
> >    try
> >    {
> >      String pattern = (String)input.get(0);
> >      String value = (String)input.get(1);
> >
> >      return value.matches(pattern);
> >    }
> >    catch(Exception e)
> >    {
> >      throw WrappedIOException.wrap("ouch!", e);
> >    }
> >  }
> > }
> >
> >
> > and use it just like this:
> >
> > b = foreach html_pages generate portal_id, MyMatch('some pattern', html)
> as
> > wp_match;
> >
> >
> >
> >
> > On Thu, Sep 27, 2012 at 12:38 PM, Alan Gates <[EMAIL PROTECTED]>
> wrote:
> >
> >> What version of Pig are you using?
> >>
> >> Alan.
> >>
> >> On Sep 27, 2012, at 8:54 AM, James Kebinger wrote:
> >>
> >>> Hello, I'm having some trouble doing something I thought would be easy:
> >> I'd
> >>> like to use matches to generate a boolean flag but this seems to not
> >>> compile:
> >>>
> >>> FOREACH html_pages GENERATE portal_id, html matches 'some pattern' as
> >>> wp_match:boolean;
> >>>
> >>> I've tried wrapping it in parens too, with no luck.
> >>>
> >>> Is this possible, or am I out of luck?
> >>>
> >>> thanks
> >>
> >>
>
>