Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # dev >> Re: [jira] [Updated] (ACCUMULO-209) RegExFilter does not properly regex when using multi-byte characters


Copy link to this message
-
Re: [jira] [Updated] (ACCUMULO-209) RegExFilter does not properly regex when using multi-byte characters
Did anyone grep the source to see if any other classes are using
ByteArrayBackedCharSequence? Should that class be removed or fixed?

On Fri, Dec 9, 2011 at 12:25 PM, Billie Rinaldi (Updated) (JIRA) <
[EMAIL PROTECTED]> wrote:

>
>     [
> https://issues.apache.org/jira/browse/ACCUMULO-209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>
> Billie Rinaldi updated ACCUMULO-209:
> ------------------------------------
>
>     Resolution: Fixed
>        Status: Resolved  (was: Patch Available)
>
> > RegExFilter does not properly regex when using multi-byte characters
> > --------------------------------------------------------------------
> >
> >                 Key: ACCUMULO-209
> >                 URL: https://issues.apache.org/jira/browse/ACCUMULO-209
> >             Project: Accumulo
> >          Issue Type: Bug
> >          Components: client
> >    Affects Versions: 1.3.5
> >            Reporter: Jim Klucar
> >            Assignee: Billie Rinaldi
> >             Fix For: 1.4.0, 1.5.0
> >
> >         Attachments: accumulo-209-RegExFilter.patch,
> accumulo-209-RegExFilterTest.patch, accumulo-209.patch
> >
> >   Original Estimate: 1h
> >  Remaining Estimate: 1h
> >
> > The current RegExFilter class uses a ByteArrayBackedCharSequence to set
> the data to match against. The ByteArrayBackedCharSequence contains a line
> of code that prevents the matcher from properly matching multi-byte
> characters.
> > Line 49 of ByteArrayBackedCharSequence.java is:
> > return (char) (0xff & data[offset + index]);
> > This incorrectly casts a single byte from the byte array to a char,
> which is 2 bytes in Java. This prevents the RegExFilter from properly
> performing Regular Expressions on multi-byte character encoded values.
> > A patch for the RegExFilter.java file has been created and will be
> submitted.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators:
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>
+
Keith Turner 2011-12-12, 15:39