Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> Setting Charset in getBytes() call.


Copy link to this message
-
Re: Setting Charset in getBytes() call.
Isn't it easier to just set the JVM property `file.encoding`?

On Sun, Oct 28, 2012 at 3:18 PM, Ed Kohlwey <[EMAIL PROTECTED]> wrote:

> If you use a private static field in each class for the charset, it will
> basically be a singleton because charsets are cached in char set.forname.
> IMHO this is a somewhat cleaner approach than having lots of static imports
> to utility classes with lots of constants in them.
> On Oct 28, 2012 5:50 PM, "David Medinets" <[EMAIL PROTECTED]>
> wrote:
>
> >
> >
> https://issues.apache.org/jira/browse/ACCUMULO-241?focusedCommentId=13449680&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13449680
> >
> > In this comment, John mentioned that all getBytes() method calls
> > should be changed to use UTF8. There are about 1,800 getBytes() calls
> > and not all of them involve String objects. I am working on ways to
> > identify a subset of these calls to change.
> >
> > I have created https://issues.apache.org/jira/browse/ACCUMULO-836 to
> > track this issue.
> >
> > Should we create one static Charset object?
> >
> >   Class AccumuloDefaultCharset {
> >     public static Charset UTF8 = Charset.forName("UTF8");
> >   }
> >
> > Should we use a static constant?
> >
> >   public static String UTF8 = "UTF8";
> >
> > I have found one instance of getBytes() in InputFormatBase:
> >
> >   protected static byte[] getPassword(Configuration conf) {
> >     return Base64.decodeBase64(conf.get(PASSWORD, "").getBytes());
> >   }
> >
> > Are there any reasons why I can't start specifying the charset? Is
> > UTF8 the right Charset to use? I am not an expert in non-English
> > charsets, so guidance would be welcome.
> >
>