Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> Setting Charset in getBytes() call.


Copy link to this message
-
Re: Setting Charset in getBytes() call.
So perhaps we should have ISO-8859-1 as the standard. Mike- do you see any
reason to use something beside ISO-8859-1 for the encodings?

John

On Mon, Oct 29, 2012 at 3:14 PM, Michael Flester <[EMAIL PROTECTED]> wrote:

> > UTF-8 should always be present (according to the JLS), and as a
> multi-byte
> > format should be able to encode any character that you would need to.
> >
>
> UTF-8 cannot encode arbitrary data. All data that we store in accumulo
> is not characters. A safe encoding to use as a pass through when you
> don't know if you are dealing with characters is ISO-8859-1 since we know
> that we can make the round trip from bytes to string to bytes without loss.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB