Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Re: Using matches in generate clause?


Copy link to this message
-
Re: Using matches in generate clause?
That was pig 0.10.

This line:
matched = FOREACH counts_raw GENERATE
com.kebinger.pigbat.BYTES_TO_INT(key,0) as portal_id, (html matches
'(?s).*generator" content="WordPress.*|.*wp-content.*') as wp_match:boolean;

Gives me the error
ERROR 1200: <file count_wordpress_pages.pig, line 18, column 93>  Syntax
error, unexpected symbol at or near 'html'

Taking off the parens
ERROR 1200: <file count_wordpress_pages.pig, line 18, column 97>
 mismatched input 'matches' expecting SEMI_COLON

and converting to an int as suggested later in the thread:

matched = FOREACH counts_raw GENERATE
com.kebinger.pigbat.BYTES_TO_INT(key,0) as portal_id, (html matches
'(?s).*generator" content="WordPress.*|.*wp-content.*' ? 1 : 0) as
wp_match:int;

does work. So the int approach is a nice work around
On Thu, Sep 27, 2012 at 12:38 PM, Alan Gates <[EMAIL PROTECTED]> wrote:

> What version of Pig are you using?
>
> Alan.
>
> On Sep 27, 2012, at 8:54 AM, James Kebinger wrote:
>
> > Hello, I'm having some trouble doing something I thought would be easy:
> I'd
> > like to use matches to generate a boolean flag but this seems to not
> > compile:
> >
> > FOREACH html_pages GENERATE portal_id, html matches 'some pattern' as
> > wp_match:boolean;
> >
> > I've tried wrapping it in parens too, with no luck.
> >
> > Is this possible, or am I out of luck?
> >
> > thanks
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB