Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> Setting Charset in getBytes() call.


Copy link to this message
-
Re: Setting Charset in getBytes() call.
On Mon, Oct 29, 2012 at 8:46 PM, Josh Elser <[EMAIL PROTECTED]> wrote:
> +1 Mike.
>
> 1. It would be hard for me to believe Key/Value are ever handled internally
> in terms of Strings, but, if such a case does exist, it would be extremely
> prudent to fix.
>
> 2. FWIW, the Shell does use ISO-8859-1 as its charset which is referenced by
> other commands [1,2]. It would be good to double check all of the other
> commands.

I'm a bit lost. Any possible Java String can be rendered in UTF-8. So,
if you are calling String.getBytes to turn a string into some bytes
for some purpose, I think you need UTF-8.

On the other hand, as Mike pointed out, new String(somebytes, "utf-8")
will destroy data for some byte values that are not, in fact, UTF-8.
By why would Accumulo ever need to string-ify some array of bytes of
uncertain parentage?
>
> [1]
> https://github.com/apache/accumulo/blob/trunk/core/src/main/java/org/apache/accumulo/core/util/shell/Shell.java
> [2]
> https://github.com/apache/accumulo/blob/trunk/core/src/main/java/org/apache/accumulo/core/util/shell/commands/InsertCommand.java
>
>
> On 10/29/2012 8:27 PM, Michael Flester wrote:
>>
>> I agree with Benson entirely with one caveat. It seems to me that there
>> might be two categories of things being discussed
>>
>>    1. User data (keys and values)
>>    2. Ancillary things needed for operation of Accumulo (passwords).
>>
>> These could well be considered separately. Trying to do anything with
>> keys and values other than treating them as bytes all of the time
>> I find quite scary.
>>
>> And if this is only being done to satisfy pmd or findbugs, those tools
>> can be convinced to modify their reporting about this issue.
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB