Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # dev - Setting Charset in getBytes() call.

Copy link to this message
Setting Charset in getBytes() call.
David Medinets 2012-10-28, 21:50

In this comment, John mentioned that all getBytes() method calls
should be changed to use UTF8. There are about 1,800 getBytes() calls
and not all of them involve String objects. I am working on ways to
identify a subset of these calls to change.

I have created https://issues.apache.org/jira/browse/ACCUMULO-836 to
track this issue.

Should we create one static Charset object?

  Class AccumuloDefaultCharset {
    public static Charset UTF8 = Charset.forName("UTF8");

Should we use a static constant?

  public static String UTF8 = "UTF8";

I have found one instance of getBytes() in InputFormatBase:

  protected static byte[] getPassword(Configuration conf) {
    return Base64.decodeBase64(conf.get(PASSWORD, "").getBytes());

Are there any reasons why I can't start specifying the charset? Is
UTF8 the right Charset to use? I am not an expert in non-English
charsets, so guidance would be welcome.