Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> Setting Charset in getBytes() call.


Copy link to this message
-
Re: Setting Charset in getBytes() call.
We also need to be concerned about any string convenience classes using an
encoding scheme that still has some logical sorting (if that's an issue).

Sent from my phone, pardon the typos and brevity.
On Oct 29, 2012 9:57 PM, "Josh Elser" <[EMAIL PROTECTED]> wrote:

> I'm saying that I don't know of anything in the core API which performs a
> getBytes() on the data itself. Accumulo itself is agnostic dealing only in
> byte[]. I think we're saying the same thing..
>
> On 10/29/2012 8:54 PM, Benson Margulies wrote:
>
>> On Mon, Oct 29, 2012 at 8:46 PM, Josh Elser <[EMAIL PROTECTED]> wrote:
>>
>>> +1 Mike.
>>>
>>> 1. It would be hard for me to believe Key/Value are ever handled
>>> internally
>>> in terms of Strings, but, if such a case does exist, it would be
>>> extremely
>>> prudent to fix.
>>>
>>> 2. FWIW, the Shell does use ISO-8859-1 as its charset which is
>>> referenced by
>>> other commands [1,2]. It would be good to double check all of the other
>>> commands.
>>>
>>
>> I'm a bit lost. Any possible Java String can be rendered in UTF-8. So,
>> if you are calling String.getBytes to turn a string into some bytes
>> for some purpose, I think you need UTF-8.
>>
>> On the other hand, as Mike pointed out, new String(somebytes, "utf-8")
>> will destroy data for some byte values that are not, in fact, UTF-8.
>> By why would Accumulo ever need to string-ify some array of bytes of
>> uncertain parentage?
>>
>>
>>
>>> [1]
>>> https://github.com/apache/**accumulo/blob/trunk/core/src/**
>>> main/java/org/apache/accumulo/**core/util/shell/Shell.java<https://github.com/apache/accumulo/blob/trunk/core/src/main/java/org/apache/accumulo/core/util/shell/Shell.java>
>>> [2]
>>> https://github.com/apache/**accumulo/blob/trunk/core/src/**
>>> main/java/org/apache/accumulo/**core/util/shell/commands/**
>>> InsertCommand.java<https://github.com/apache/accumulo/blob/trunk/core/src/main/java/org/apache/accumulo/core/util/shell/commands/InsertCommand.java>
>>>
>>>
>>> On 10/29/2012 8:27 PM, Michael Flester wrote:
>>>
>>>>
>>>> I agree with Benson entirely with one caveat. It seems to me that there
>>>> might be two categories of things being discussed
>>>>
>>>>     1. User data (keys and values)
>>>>     2. Ancillary things needed for operation of Accumulo (passwords).
>>>>
>>>> These could well be considered separately. Trying to do anything with
>>>> keys and values other than treating them as bytes all of the time
>>>> I find quite scary.
>>>>
>>>> And if this is only being done to satisfy pmd or findbugs, those tools
>>>> can be convinced to modify their reporting about this issue.
>>>>
>>>>
>>>