Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Find function with regexp


+
Jakub Glapa 2012-06-18, 11:14
+
Norbert Burger 2012-06-18, 14:49
+
Jakub Glapa 2012-06-18, 19:31
Copy link to this message
-
Re: Find function with regexp
Any reason you can't wrap this regex with wildcards aligned with start-line
and end-line anchors, i.e.:

^.*([^0-9])\1{3,}.*$

Agreed that it would be nice if MATCHES was less greedy here, but perhaps
this'll avoid you having to write your own UDF.

Norbert

On Mon, Jun 18, 2012 at 3:31 PM, Jakub Glapa <[EMAIL PROTECTED]> wrote:

> Hi Norbert,
> thanks for the tip.
> I think that MATCHES operator won't work for me because it tries to match
> the whole region.
> In my case I'm interesting in detecting the sequence anywhere in the
> string.
>
> e.g.
> abccccdef - filter out
> abcdeeeef - filter out
> aabcdeef - leave
> 111111abcd - leave
>
> I want to filter out all the string with at least 4 times repeated char
> sequences but not numbers.
>
> regexp for detecting those is: ([^0-9])\1{3,}
> but it won't work with MATCHES
>
> I have a trivial working UDF that just calls the pattern().matcher().find()
> but maybe there is something that I could use?
>
>
> --
> regards,
> Jakub Glapa
>
>
> On Mon, Jun 18, 2012 at 3:49 PM, Norbert Burger <[EMAIL PROTECTED]
> >wrote:
>
> > Jakub -- The MATCHES operator accepts regexes as input.  You can add a
> NOT
> > to invert the logic.
> >
> > http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html
> >
> > Norbert
> >
> > On Mon, Jun 18, 2012 at 7:14 AM, Jakub Glapa <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Hi all,
> > > I found in pig latin a 'matches' operator for pattern matching.
> > > I didn't find it in documentation but maybe there exists something
> > similar
> > > but for searching?
> > > Basically in java world I would want to get the result of the
> > > Matcher.find() method not Matcher.matches().
> > > Will I have to end up writing my own UDF for that?
> > >
> > > Thanks for help.
> > >
> > > PS.
> > > I'm trying to filter out strings with consecutive repeated characters.
> > I've
> > > constructed a regexp that detects them.
> > > Now I just have to apply it somehow.
> > >
> > >
> > > --
> > > regards,
> > > Jakub Glapa
> > >
> >
>
+
Jakub Glapa 2012-06-19, 10:56