|
|
-
Re: Using matches in generate clause?
Alan Gates 2012-09-27, 16:38
What version of Pig are you using?
Alan.
On Sep 27, 2012, at 8:54 AM, James Kebinger wrote:
> Hello, I'm having some trouble doing something I thought would be easy: I'd > like to use matches to generate a boolean flag but this seems to not > compile: > > FOREACH html_pages GENERATE portal_id, html matches 'some pattern' as > wp_match:boolean; > > I've tried wrapping it in parens too, with no luck. > > Is this possible, or am I out of luck? > > thanks
-
Re: Using matches in generate clause?
pablomar 2012-09-27, 17:34
no idea why, but matches works with FILTER but it doesn't with FOREACH I've tried with pig 0.9.2
example (this works): b = filter html_pages by html matches 'some pattern'; if you still want to do it with foreach, you can write your UDF, something like:
public class MyMatch extends EvalFunc <Boolean> { public Boolean exec(Tuple input) throws IOException { try { String pattern = (String)input.get(0); String value = (String)input.get(1);
return value.matches(pattern); } catch(Exception e) { throw WrappedIOException.wrap("ouch!", e); } } } and use it just like this:
b = foreach html_pages generate portal_id, MyMatch('some pattern', html) as wp_match; On Thu, Sep 27, 2012 at 12:38 PM, Alan Gates <[EMAIL PROTECTED]> wrote:
> What version of Pig are you using? > > Alan. > > On Sep 27, 2012, at 8:54 AM, James Kebinger wrote: > > > Hello, I'm having some trouble doing something I thought would be easy: > I'd > > like to use matches to generate a boolean flag but this seems to not > > compile: > > > > FOREACH html_pages GENERATE portal_id, html matches 'some pattern' as > > wp_match:boolean; > > > > I've tried wrapping it in parens too, with no luck. > > > > Is this possible, or am I out of luck? > > > > thanks > >
-
Re: Using matches in generate clause?
Alan Gates 2012-09-27, 17:38
In Pig 0.9 boolean was not yet a first class data type, so boolean types were not allowed in foreach statements. In Pig 0.10 boolean became a first class type, so expressions that return booleans (such as matches) should work.
Alan. On Sep 27, 2012, at 10:34 AM, pablomar wrote:
> no idea why, but matches works with FILTER but it doesn't with FOREACH > I've tried with pig 0.9.2 > > example (this works): > b = filter html_pages by html matches 'some pattern'; > > > if you still want to do it with foreach, you can write your UDF, something > like: > > public class MyMatch extends EvalFunc <Boolean> > { > public Boolean exec(Tuple input) throws IOException > { > try > { > String pattern = (String)input.get(0); > String value = (String)input.get(1); > > return value.matches(pattern); > } > catch(Exception e) > { > throw WrappedIOException.wrap("ouch!", e); > } > } > } > > > and use it just like this: > > b = foreach html_pages generate portal_id, MyMatch('some pattern', html) as > wp_match; > > > > > On Thu, Sep 27, 2012 at 12:38 PM, Alan Gates <[EMAIL PROTECTED]> wrote: > >> What version of Pig are you using? >> >> Alan. >> >> On Sep 27, 2012, at 8:54 AM, James Kebinger wrote: >> >>> Hello, I'm having some trouble doing something I thought would be easy: >> I'd >>> like to use matches to generate a boolean flag but this seems to not >>> compile: >>> >>> FOREACH html_pages GENERATE portal_id, html matches 'some pattern' as >>> wp_match:boolean; >>> >>> I've tried wrapping it in parens too, with no luck. >>> >>> Is this possible, or am I out of luck? >>> >>> thanks >> >>
-
Re: Using matches in generate clause?
Dmitriy Ryaboy 2012-09-27, 19:31
With Pig 0.9 you can do this, though:
FOREACH html_pages GENERATE portal_id, (html matches 'some pattern' ? 1 : 0) as wp_match:int;
On Thu, Sep 27, 2012 at 10:38 AM, Alan Gates <[EMAIL PROTECTED]> wrote:
> In Pig 0.9 boolean was not yet a first class data type, so boolean types > were not allowed in foreach statements. In Pig 0.10 boolean became a first > class type, so expressions that return booleans (such as matches) should > work. > > Alan. > > > On Sep 27, 2012, at 10:34 AM, pablomar wrote: > > > no idea why, but matches works with FILTER but it doesn't with FOREACH > > I've tried with pig 0.9.2 > > > > example (this works): > > b = filter html_pages by html matches 'some pattern'; > > > > > > if you still want to do it with foreach, you can write your UDF, > something > > like: > > > > public class MyMatch extends EvalFunc <Boolean> > > { > > public Boolean exec(Tuple input) throws IOException > > { > > try > > { > > String pattern = (String)input.get(0); > > String value = (String)input.get(1); > > > > return value.matches(pattern); > > } > > catch(Exception e) > > { > > throw WrappedIOException.wrap("ouch!", e); > > } > > } > > } > > > > > > and use it just like this: > > > > b = foreach html_pages generate portal_id, MyMatch('some pattern', html) > as > > wp_match; > > > > > > > > > > On Thu, Sep 27, 2012 at 12:38 PM, Alan Gates <[EMAIL PROTECTED]> > wrote: > > > >> What version of Pig are you using? > >> > >> Alan. > >> > >> On Sep 27, 2012, at 8:54 AM, James Kebinger wrote: > >> > >>> Hello, I'm having some trouble doing something I thought would be easy: > >> I'd > >>> like to use matches to generate a boolean flag but this seems to not > >>> compile: > >>> > >>> FOREACH html_pages GENERATE portal_id, html matches 'some pattern' as > >>> wp_match:boolean; > >>> > >>> I've tried wrapping it in parens too, with no luck. > >>> > >>> Is this possible, or am I out of luck? > >>> > >>> thanks > >> > >> > >
-
Re: Using matches in generate clause?
James Kebinger 2012-09-28, 21:52
That was pig 0.10.
This line: matched = FOREACH counts_raw GENERATE com.kebinger.pigbat.BYTES_TO_INT(key,0) as portal_id, (html matches '(?s).*generator" content="WordPress.*|.*wp-content.*') as wp_match:boolean;
Gives me the error ERROR 1200: <file count_wordpress_pages.pig, line 18, column 93> Syntax error, unexpected symbol at or near 'html'
Taking off the parens ERROR 1200: <file count_wordpress_pages.pig, line 18, column 97> mismatched input 'matches' expecting SEMI_COLON
and converting to an int as suggested later in the thread:
matched = FOREACH counts_raw GENERATE com.kebinger.pigbat.BYTES_TO_INT(key,0) as portal_id, (html matches '(?s).*generator" content="WordPress.*|.*wp-content.*' ? 1 : 0) as wp_match:int;
does work. So the int approach is a nice work around On Thu, Sep 27, 2012 at 12:38 PM, Alan Gates <[EMAIL PROTECTED]> wrote:
> What version of Pig are you using? > > Alan. > > On Sep 27, 2012, at 8:54 AM, James Kebinger wrote: > > > Hello, I'm having some trouble doing something I thought would be easy: > I'd > > like to use matches to generate a boolean flag but this seems to not > > compile: > > > > FOREACH html_pages GENERATE portal_id, html matches 'some pattern' as > > wp_match:boolean; > > > > I've tried wrapping it in parens too, with no luck. > > > > Is this possible, or am I out of luck? > > > > thanks > >
|
|