Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Re: Using matches in generate clause?


+
Alan Gates 2012-09-27, 16:38
+
James Kebinger 2012-09-28, 21:52
+
pablomar 2012-09-27, 17:34
Copy link to this message
-
Re: Using matches in generate clause?
Alan Gates 2012-09-27, 17:38
In Pig 0.9 boolean was not yet a first class data type, so boolean types were not allowed in foreach statements.  In Pig 0.10 boolean became a first class type, so expressions that return booleans (such as matches) should work.

Alan.
On Sep 27, 2012, at 10:34 AM, pablomar wrote:

> no idea why, but matches works with FILTER but it doesn't with FOREACH
> I've tried with pig 0.9.2
>
> example (this works):
> b = filter html_pages by html matches 'some pattern';
>
>
> if you still want to do it with foreach, you can write your UDF, something
> like:
>
> public class MyMatch extends EvalFunc <Boolean>
> {
>  public Boolean exec(Tuple input) throws IOException
>  {
>    try
>    {
>      String pattern = (String)input.get(0);
>      String value = (String)input.get(1);
>
>      return value.matches(pattern);
>    }
>    catch(Exception e)
>    {
>      throw WrappedIOException.wrap("ouch!", e);
>    }
>  }
> }
>
>
> and use it just like this:
>
> b = foreach html_pages generate portal_id, MyMatch('some pattern', html) as
> wp_match;
>
>
>
>
> On Thu, Sep 27, 2012 at 12:38 PM, Alan Gates <[EMAIL PROTECTED]> wrote:
>
>> What version of Pig are you using?
>>
>> Alan.
>>
>> On Sep 27, 2012, at 8:54 AM, James Kebinger wrote:
>>
>>> Hello, I'm having some trouble doing something I thought would be easy:
>> I'd
>>> like to use matches to generate a boolean flag but this seems to not
>>> compile:
>>>
>>> FOREACH html_pages GENERATE portal_id, html matches 'some pattern' as
>>> wp_match:boolean;
>>>
>>> I've tried wrapping it in parens too, with no luck.
>>>
>>> Is this possible, or am I out of luck?
>>>
>>> thanks
>>
>>
+
Dmitriy Ryaboy 2012-09-27, 19:31