Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # dev - Setting Charset in getBytes() call.


Copy link to this message
-
Re: Setting Charset in getBytes() call.
John Vines 2012-10-29, 19:18
So perhaps we should have ISO-8859-1 as the standard. Mike- do you see any
reason to use something beside ISO-8859-1 for the encodings?

John

On Mon, Oct 29, 2012 at 3:14 PM, Michael Flester <[EMAIL PROTECTED]> wrote:

> > UTF-8 should always be present (according to the JLS), and as a
> multi-byte
> > format should be able to encode any character that you would need to.
> >
>
> UTF-8 cannot encode arbitrary data. All data that we store in accumulo
> is not characters. A safe encoding to use as a pass through when you
> don't know if you are dealing with characters is ISO-8859-1 since we know
> that we can make the round trip from bytes to string to bytes without loss.
>