Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Find function with regexp


Copy link to this message
-
Re: Find function with regexp
cool, that sounds like good idea

Thanks Norbert!

--
regards,
Jakub Glapa
On Tue, Jun 19, 2012 at 1:22 AM, Norbert Burger <[EMAIL PROTECTED]>wrote:

> Any reason you can't wrap this regex with wildcards aligned with start-line
> and end-line anchors, i.e.:
>
> ^.*([^0-9])\1{3,}.*$
>
> Agreed that it would be nice if MATCHES was less greedy here, but perhaps
> this'll avoid you having to write your own UDF.
>
> Norbert
>
> On Mon, Jun 18, 2012 at 3:31 PM, Jakub Glapa <[EMAIL PROTECTED]>
> wrote:
>
> > Hi Norbert,
> > thanks for the tip.
> > I think that MATCHES operator won't work for me because it tries to match
> > the whole region.
> > In my case I'm interesting in detecting the sequence anywhere in the
> > string.
> >
> > e.g.
> > abccccdef - filter out
> > abcdeeeef - filter out
> > aabcdeef - leave
> > 111111abcd - leave
> >
> > I want to filter out all the string with at least 4 times repeated char
> > sequences but not numbers.
> >
> > regexp for detecting those is: ([^0-9])\1{3,}
> > but it won't work with MATCHES
> >
> > I have a trivial working UDF that just calls the
> pattern().matcher().find()
> > but maybe there is something that I could use?
> >
> >
> > --
> > regards,
> > Jakub Glapa
> >
> >
> > On Mon, Jun 18, 2012 at 3:49 PM, Norbert Burger <
> [EMAIL PROTECTED]
> > >wrote:
> >
> > > Jakub -- The MATCHES operator accepts regexes as input.  You can add a
> > NOT
> > > to invert the logic.
> > >
> > > http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html
> > >
> > > Norbert
> > >
> > > On Mon, Jun 18, 2012 at 7:14 AM, Jakub Glapa <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > Hi all,
> > > > I found in pig latin a 'matches' operator for pattern matching.
> > > > I didn't find it in documentation but maybe there exists something
> > > similar
> > > > but for searching?
> > > > Basically in java world I would want to get the result of the
> > > > Matcher.find() method not Matcher.matches().
> > > > Will I have to end up writing my own UDF for that?
> > > >
> > > > Thanks for help.
> > > >
> > > > PS.
> > > > I'm trying to filter out strings with consecutive repeated
> characters.
> > > I've
> > > > constructed a regexp that detects them.
> > > > Now I just have to apply it somehow.
> > > >
> > > >
> > > > --
> > > > regards,
> > > > Jakub Glapa
> > > >
> > >
> >
>