Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Find function with regexp


+
Jakub Glapa 2012-06-18, 11:14
+
Norbert Burger 2012-06-18, 14:49
Copy link to this message
-
Re: Find function with regexp
Hi Norbert,
thanks for the tip.
I think that MATCHES operator won't work for me because it tries to match
the whole region.
In my case I'm interesting in detecting the sequence anywhere in the string.

e.g.
abccccdef - filter out
abcdeeeef - filter out
aabcdeef - leave
111111abcd - leave

I want to filter out all the string with at least 4 times repeated char
sequences but not numbers.

regexp for detecting those is: ([^0-9])\1{3,}
but it won't work with MATCHES

I have a trivial working UDF that just calls the pattern().matcher().find()
but maybe there is something that I could use?
--
regards,
Jakub Glapa
On Mon, Jun 18, 2012 at 3:49 PM, Norbert Burger <[EMAIL PROTECTED]>wrote:

> Jakub -- The MATCHES operator accepts regexes as input.  You can add a NOT
> to invert the logic.
>
> http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html
>
> Norbert
>
> On Mon, Jun 18, 2012 at 7:14 AM, Jakub Glapa <[EMAIL PROTECTED]>
> wrote:
>
> > Hi all,
> > I found in pig latin a 'matches' operator for pattern matching.
> > I didn't find it in documentation but maybe there exists something
> similar
> > but for searching?
> > Basically in java world I would want to get the result of the
> > Matcher.find() method not Matcher.matches().
> > Will I have to end up writing my own UDF for that?
> >
> > Thanks for help.
> >
> > PS.
> > I'm trying to filter out strings with consecutive repeated characters.
> I've
> > constructed a regexp that detects them.
> > Now I just have to apply it somehow.
> >
> >
> > --
> > regards,
> > Jakub Glapa
> >
>
+
Norbert Burger 2012-06-19, 00:22
+
Jakub Glapa 2012-06-19, 10:56
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB