Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Re: Using matches in generate clause?


Copy link to this message
-
Re: Using matches in generate clause?
That was pig 0.10.

This line:
matched = FOREACH counts_raw GENERATE
com.kebinger.pigbat.BYTES_TO_INT(key,0) as portal_id, (html matches
'(?s).*generator" content="WordPress.*|.*wp-content.*') as wp_match:boolean;

Gives me the error
ERROR 1200: <file count_wordpress_pages.pig, line 18, column 93>  Syntax
error, unexpected symbol at or near 'html'

Taking off the parens
ERROR 1200: <file count_wordpress_pages.pig, line 18, column 97>
 mismatched input 'matches' expecting SEMI_COLON

and converting to an int as suggested later in the thread:

matched = FOREACH counts_raw GENERATE
com.kebinger.pigbat.BYTES_TO_INT(key,0) as portal_id, (html matches
'(?s).*generator" content="WordPress.*|.*wp-content.*' ? 1 : 0) as
wp_match:int;

does work. So the int approach is a nice work around
On Thu, Sep 27, 2012 at 12:38 PM, Alan Gates <[EMAIL PROTECTED]> wrote:

> What version of Pig are you using?
>
> Alan.
>
> On Sep 27, 2012, at 8:54 AM, James Kebinger wrote:
>
> > Hello, I'm having some trouble doing something I thought would be easy:
> I'd
> > like to use matches to generate a boolean flag but this seems to not
> > compile:
> >
> > FOREACH html_pages GENERATE portal_id, html matches 'some pattern' as
> > wp_match:boolean;
> >
> > I've tried wrapping it in parens too, with no luck.
> >
> > Is this possible, or am I out of luck?
> >
> > thanks
>
>