Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> Re: [jira] [Updated] (ACCUMULO-209) RegExFilter does not properly regex when using multi-byte characters


Copy link to this message
-
Re: [jira] [Updated] (ACCUMULO-209) RegExFilter does not properly regex when using multi-byte characters
Did anyone grep the source to see if any other classes are using
ByteArrayBackedCharSequence? Should that class be removed or fixed?

On Fri, Dec 9, 2011 at 12:25 PM, Billie Rinaldi (Updated) (JIRA) <
[EMAIL PROTECTED]> wrote:

>
>     [
> https://issues.apache.org/jira/browse/ACCUMULO-209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>
> Billie Rinaldi updated ACCUMULO-209:
> ------------------------------------
>
>     Resolution: Fixed
>        Status: Resolved  (was: Patch Available)
>
> > RegExFilter does not properly regex when using multi-byte characters
> > --------------------------------------------------------------------
> >
> >                 Key: ACCUMULO-209
> >                 URL: https://issues.apache.org/jira/browse/ACCUMULO-209
> >             Project: Accumulo
> >          Issue Type: Bug
> >          Components: client
> >    Affects Versions: 1.3.5
> >            Reporter: Jim Klucar
> >            Assignee: Billie Rinaldi
> >             Fix For: 1.4.0, 1.5.0
> >
> >         Attachments: accumulo-209-RegExFilter.patch,
> accumulo-209-RegExFilterTest.patch, accumulo-209.patch
> >
> >   Original Estimate: 1h
> >  Remaining Estimate: 1h
> >
> > The current RegExFilter class uses a ByteArrayBackedCharSequence to set
> the data to match against. The ByteArrayBackedCharSequence contains a line
> of code that prevents the matcher from properly matching multi-byte
> characters.
> > Line 49 of ByteArrayBackedCharSequence.java is:
> > return (char) (0xff & data[offset + index]);
> > This incorrectly casts a single byte from the byte array to a char,
> which is 2 bytes in Java. This prevents the RegExFilter from properly
> performing Regular Expressions on multi-byte character encoded values.
> > A patch for the RegExFilter.java file has been created and will be
> submitted.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators:
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB