Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Find function with regexp


+
Jakub Glapa 2012-06-18, 11:14
+
Norbert Burger 2012-06-18, 14:49
+
Jakub Glapa 2012-06-18, 19:31
Copy link to this message
-
Re: Find function with regexp
Any reason you can't wrap this regex with wildcards aligned with start-line
and end-line anchors, i.e.:

^.*([^0-9])\1{3,}.*$

Agreed that it would be nice if MATCHES was less greedy here, but perhaps
this'll avoid you having to write your own UDF.

Norbert

On Mon, Jun 18, 2012 at 3:31 PM, Jakub Glapa <[EMAIL PROTECTED]> wrote:

> Hi Norbert,
> thanks for the tip.
> I think that MATCHES operator won't work for me because it tries to match
> the whole region.
> In my case I'm interesting in detecting the sequence anywhere in the
> string.
>
> e.g.
> abccccdef - filter out
> abcdeeeef - filter out
> aabcdeef - leave
> 111111abcd - leave
>
> I want to filter out all the string with at least 4 times repeated char
> sequences but not numbers.
>
> regexp for detecting those is: ([^0-9])\1{3,}
> but it won't work with MATCHES
>
> I have a trivial working UDF that just calls the pattern().matcher().find()
> but maybe there is something that I could use?
>
>
> --
> regards,
> Jakub Glapa
>
>
> On Mon, Jun 18, 2012 at 3:49 PM, Norbert Burger <[EMAIL PROTECTED]
> >wrote:
>
> > Jakub -- The MATCHES operator accepts regexes as input.  You can add a
> NOT
> > to invert the logic.
> >
> > http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html
> >
> > Norbert
> >
> > On Mon, Jun 18, 2012 at 7:14 AM, Jakub Glapa <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Hi all,
> > > I found in pig latin a 'matches' operator for pattern matching.
> > > I didn't find it in documentation but maybe there exists something
> > similar
> > > but for searching?
> > > Basically in java world I would want to get the result of the
> > > Matcher.find() method not Matcher.matches().
> > > Will I have to end up writing my own UDF for that?
> > >
> > > Thanks for help.
> > >
> > > PS.
> > > I'm trying to filter out strings with consecutive repeated characters.
> > I've
> > > constructed a regexp that detects them.
> > > Now I just have to apply it somehow.
> > >
> > >
> > > --
> > > regards,
> > > Jakub Glapa
> > >
> >
>
+
Jakub Glapa 2012-06-19, 10:56
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB