Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # dev >> Setting Charset in getBytes() call.


+
David Medinets 2012-10-28, 21:50
+
Ed Kohlwey 2012-10-28, 22:18
+
William Slacum 2012-10-29, 15:39
+
David Medinets 2012-10-29, 16:00
+
Josh Elser 2012-10-29, 16:21
+
Benson Margulies 2012-10-29, 16:24
+
John Vines 2012-10-29, 16:42
+
Josh Elser 2012-10-29, 16:57
+
David Medinets 2012-10-29, 17:00
+
William Slacum 2012-10-29, 17:13
+
Mike Drob 2012-10-29, 17:16
+
Michael Flester 2012-10-29, 19:14
+
John Vines 2012-10-29, 19:18
+
Benson Margulies 2012-10-29, 20:02
Copy link to this message
-
Re: Setting Charset in getBytes() call.
Anytime that I've encountered non-English character sets, the answer
has been to use UTF-8. I'm moving forward with that assumption since
it is safe change. If the group decides to use a different default
encoding, it will be trivial to build on the work that I've done
identifying getBytes() calls. I will post a list of files and my
methodology before a svn checkin.

On Mon, Oct 29, 2012 at 4:02 PM, Benson Margulies <[EMAIL PROTECTED]> wrote:
> On Mon, Oct 29, 2012 at 3:18 PM, John Vines <[EMAIL PROTECTED]> wrote:
>> So perhaps we should have ISO-8859-1 as the standard. Mike- do you see any
>> reason to use something beside ISO-8859-1 for the encodings?
>
> I object and caution against *any* plan that involves transcoding from
> X to UTF-16 and back where when the data is not always going to be
> valid bytes of encoding X. The only clean solution here is to have an
> API entirely in terms of bytes, and either let the user do getBytes if
> they want to store string data, or provide additional API.
>
>
>
>>
>> John
>>
>> On Mon, Oct 29, 2012 at 3:14 PM, Michael Flester <[EMAIL PROTECTED]> wrote:
>>
>>> > UTF-8 should always be present (according to the JLS), and as a
>>> multi-byte
>>> > format should be able to encode any character that you would need to.
>>> >
>>>
>>> UTF-8 cannot encode arbitrary data. All data that we store in accumulo
>>> is not characters. A safe encoding to use as a pass through when you
>>> don't know if you are dealing with characters is ISO-8859-1 since we know
>>> that we can make the round trip from bytes to string to bytes without loss.
>>>
+
Michael Flester 2012-10-30, 00:27
+
Josh Elser 2012-10-30, 00:46
+
Benson Margulies 2012-10-30, 00:54
+
Josh Elser 2012-10-30, 01:57
+
John Vines 2012-10-30, 02:08
+
David Medinets 2012-10-30, 02:47
+
Josh Elser 2012-10-30, 22:27
+
David Medinets 2012-10-30, 23:47
+
Josh Elser 2012-10-31, 00:21
+
Benson Margulies 2012-10-31, 00:31
+
William Slacum 2012-10-31, 00:41
+
David Medinets 2012-10-31, 02:29
+
John Vines 2012-10-31, 02:35
+
Christopher Tubbs 2012-10-31, 18:02
+
Marc Parisi 2012-11-02, 12:24
+
Benson Margulies 2012-11-02, 19:56
+
John Vines 2012-11-02, 20:18
+
Christopher Tubbs 2012-11-03, 01:54
+
David Medinets 2012-11-03, 03:34
+
Josh Elser 2012-11-02, 23:34
+
Drew Farris 2012-10-30, 01:22
+
Adam Fuchs 2012-10-30, 20:26
+
Ed Kohlwey 2012-10-30, 01:44
+
Ed Kohlwey 2012-10-30, 01:54
+
Eric Newton 2012-10-30, 20:02
+
Marc Parisi 2012-10-30, 22:28
+
Marc Parisi 2012-10-30, 22:31
+
Benson Margulies 2012-10-30, 23:26
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB