Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo, mail # dev - Setting Charset in getBytes() call.


+
David Medinets 2012-10-28, 21:50
+
Ed Kohlwey 2012-10-28, 22:18
+
William Slacum 2012-10-29, 15:39
Copy link to this message
-
Re: Setting Charset in getBytes() call.
David Medinets 2012-10-29, 16:00
I like the idea of making the change explicit in the source code.
Setting the encoding in the jvm property would be easier but not as
explicit. I have a few dozen of the files changed. Today I have free
time since Hurricane Sandy has closed offices.

On Mon, Oct 29, 2012 at 11:39 AM, William Slacum
<[EMAIL PROTECTED]> wrote:
> Isn't it easier to just set the JVM property `file.encoding`?
>
> On Sun, Oct 28, 2012 at 3:18 PM, Ed Kohlwey <[EMAIL PROTECTED]> wrote:
>
>> If you use a private static field in each class for the charset, it will
>> basically be a singleton because charsets are cached in char set.forname.
>> IMHO this is a somewhat cleaner approach than having lots of static imports
>> to utility classes with lots of constants in them.
>> On Oct 28, 2012 5:50 PM, "David Medinets" <[EMAIL PROTECTED]>
>> wrote:
>>
>> >
>> >
>> https://issues.apache.org/jira/browse/ACCUMULO-241?focusedCommentId=13449680&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13449680
>> >
>> > In this comment, John mentioned that all getBytes() method calls
>> > should be changed to use UTF8. There are about 1,800 getBytes() calls
>> > and not all of them involve String objects. I am working on ways to
>> > identify a subset of these calls to change.
>> >
>> > I have created https://issues.apache.org/jira/browse/ACCUMULO-836 to
>> > track this issue.
>> >
>> > Should we create one static Charset object?
>> >
>> >   Class AccumuloDefaultCharset {
>> >     public static Charset UTF8 = Charset.forName("UTF8");
>> >   }
>> >
>> > Should we use a static constant?
>> >
>> >   public static String UTF8 = "UTF8";
>> >
>> > I have found one instance of getBytes() in InputFormatBase:
>> >
>> >   protected static byte[] getPassword(Configuration conf) {
>> >     return Base64.decodeBase64(conf.get(PASSWORD, "").getBytes());
>> >   }
>> >
>> > Are there any reasons why I can't start specifying the charset? Is
>> > UTF8 the right Charset to use? I am not an expert in non-English
>> > charsets, so guidance would be welcome.
>> >
>>
+
Josh Elser 2012-10-29, 16:21
+
Benson Margulies 2012-10-29, 16:24
+
John Vines 2012-10-29, 16:42
+
Josh Elser 2012-10-29, 16:57
+
David Medinets 2012-10-29, 17:00
+
William Slacum 2012-10-29, 17:13
+
Mike Drob 2012-10-29, 17:16
+
Michael Flester 2012-10-29, 19:14
+
John Vines 2012-10-29, 19:18
+
Benson Margulies 2012-10-29, 20:02
+
David Medinets 2012-10-29, 20:29
+
Michael Flester 2012-10-30, 00:27
+
Josh Elser 2012-10-30, 00:46
+
Benson Margulies 2012-10-30, 00:54
+
Josh Elser 2012-10-30, 01:57
+
John Vines 2012-10-30, 02:08
+
David Medinets 2012-10-30, 02:47
+
Josh Elser 2012-10-30, 22:27
+
David Medinets 2012-10-30, 23:47
+
Josh Elser 2012-10-31, 00:21
+
Benson Margulies 2012-10-31, 00:31
+
William Slacum 2012-10-31, 00:41
+
David Medinets 2012-10-31, 02:29
+
John Vines 2012-10-31, 02:35
+
Christopher Tubbs 2012-10-31, 18:02
+
Marc Parisi 2012-11-02, 12:24
+
Benson Margulies 2012-11-02, 19:56
+
John Vines 2012-11-02, 20:18
+
Christopher Tubbs 2012-11-03, 01:54
+
David Medinets 2012-11-03, 03:34
+
Josh Elser 2012-11-02, 23:34
+
Drew Farris 2012-10-30, 01:22
+
Adam Fuchs 2012-10-30, 20:26
+
Ed Kohlwey 2012-10-30, 01:44
+
Ed Kohlwey 2012-10-30, 01:54
+
Eric Newton 2012-10-30, 20:02
+
Marc Parisi 2012-10-30, 22:28
+
Marc Parisi 2012-10-30, 22:31
+
Benson Margulies 2012-10-30, 23:26