Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Inconsistencies in comparisons using KeyComparator


Copy link to this message
-
Re: Inconsistencies in comparisons using KeyComparator
Looking at
http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/9b8c96f96a0f/src/share/classes/sun/misc/Unsafe.java,
looks like Unsafe is provided by openjdk as well.

I guess this issue, though disturbing, wouldn't show up.
On Mon, Apr 1, 2013 at 10:04 AM, Alan Chaney <[EMAIL PROTECTED]> wrote:

>
> On 4/1/2013 9:42 AM, Stack wrote:
>
>> That is an interesting (disturbing) find Alan.  Hopefully the fallback is
>> rare.  Did you have a technique for making the compare fallback to pure
>> java compare?
>>
>> Thank you,
>> St.Ack
>>
>
> I agree its disturbing! I based my findings on reading the source code for
> 0.92.1  (the CDH4.1.2 distro).
>
> It seems to me that, from org.apache.hadoop.hbase.**KeyValue$KVComparator
> the KeyComparator calls KeyComparator.compareRows which in turn calls
>
> Bytes.compareTo(left, loffset, llength, righ, roffset, rlength) which in
> turn calls Bytes.compareTo which calls LexicographicalCompareHolder.**
> BEST_COMPARER
>
> which appears to be implemented thus:
>
>   static class LexicographicalComparerHolder {
>     static final String UNSAFE_COMPARER_NAME >         LexicographicalComparerHolder.**class.getName() +
> "$UnsafeComparer";
>
>     static final Comparer<byte[]> BEST_COMPARER = getBestComparer();
>     /**
>      * Returns the Unsafe-using Comparer, or falls back to the pure-Java
>      * implementation if unable to do so.
>      */
>     static Comparer<byte[]> getBestComparer() {
>       try {
>         Class<?> theClass = Class.forName(UNSAFE_COMPARER_**NAME);
> ...
>     }
>
>     enum PureJavaComparer implements Comparer<byte[]> {
>       INSTANCE;
>
>       @Override
>       public int compareTo(byte[] buffer1, int offset1, int length1,
>    ...
>       }
>     }
>
> So, it looks like to me that Unsafe is the default. However, its not
> really very easy to debug this, except by invoking the
> KeyValue.KeyComparator and seeing what you get, which is what I did. Either
> I'm doing something very stupid (extremely plausible) or there is a bit of
> an issue here. I was hoping that someone would point out my error!
>
> I've got some unit tests that appear to show the difference.
>
> Thanks
>
> Alan
>
>
>
>
>>
>> On Mon, Apr 1, 2013 at 7:54 AM, Alan Chaney <[EMAIL PROTECTED]>
>> wrote:
>>
>>  Hi
>>>
>>> I need to write some code that sorts row keys identically to HBase.
>>>
>>> I looked at the KeyValue.KeyComparator code, and it seems that, by
>>> default, HBase elects to use the 'Unsafe' comparator as the basis of its
>>> comparison, with a fall-back to to the PureJavaComparer should Unsafe not
>>> be available (for example, in tests.)
>>>
>>> However, I'm finding that the sort order from a call to
>>> KeyValue.KeyComparator appears to be inconsistent between the two forms.
>>>
>>> As an example, comparing:
>>>
>>> (first param) (second param)
>>> 0000000000000000ffffffffffffff****ffffffffffffffffff616c1b to
>>> 0000000000000000ffffffffffffff****ffffffffffffffffff61741b
>>>
>>> gives 1 for the default (presumably, Unsafe) call, and -1 using the
>>> PureJavaComparator.
>>>
>>> I would actually expect it to be a -ve number, based on the difference of
>>> 6c to 74 in the 3rd from last byte above.
>>>
>>> Similarly
>>>
>>> 000000000000000000000000000000****000000000000000000616c1b to
>>> 000000000000000000000000000000****0000000000000000061741b
>>>
>>> gives > 0 instead of < 0. The PureJavaComparator does a byte-by-byte
>>> comparison by
>>>
>>> Is this expected? From the definition of lexicographical compare that I
>>> found, I don't think so. There's no issue of signed comparison here,
>>> because 0x6c and 0x74 are still +ve byte values.
>>>
>>> Regards
>>>
>>> Alan
>>>
>>>
>>>
>>>
>