Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> Setting Charset in getBytes() call.


Copy link to this message
-
Setting Charset in getBytes() call.
https://issues.apache.org/jira/browse/ACCUMULO-241?focusedCommentId=13449680&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13449680

In this comment, John mentioned that all getBytes() method calls
should be changed to use UTF8. There are about 1,800 getBytes() calls
and not all of them involve String objects. I am working on ways to
identify a subset of these calls to change.

I have created https://issues.apache.org/jira/browse/ACCUMULO-836 to
track this issue.

Should we create one static Charset object?

  Class AccumuloDefaultCharset {
    public static Charset UTF8 = Charset.forName("UTF8");
  }

Should we use a static constant?

  public static String UTF8 = "UTF8";

I have found one instance of getBytes() in InputFormatBase:

  protected static byte[] getPassword(Configuration conf) {
    return Base64.decodeBase64(conf.get(PASSWORD, "").getBytes());
  }

Are there any reasons why I can't start specifying the charset? Is
UTF8 the right Charset to use? I am not an expert in non-English
charsets, so guidance would be welcome.