Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> Shell Charset?


Copy link to this message
-
Re: Shell Charset?
Sounds like we should grep through the codebase and make sure the only
charset we're using is UTF-8...
10
On Sun, May 5, 2013 at 8:08 PM, Christopher <[EMAIL PROTECTED]> wrote:

> The shell should accept java "String" from the the console (leaving
> the job of converting input bytes to a java String argument to the
> locale-dependent console), and should only translate them to UTF-8
> when it sends it to Accumulo, I think.
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
>
> On Sun, May 5, 2013 at 6:49 PM, Drew Farris <[EMAIL PROTECTED]> wrote:
> > In o.a.a.core.uti.shell.commands.OptUtil, I notice that getStartRow and
> > getEndRow, use the following snippet to read their arguments:
> >
> > new Text(cl.getOptionValue(END_ROW_OPT).getBytes(Shell.CHARSET));
> >
> > Here, Shell.CHARSET is set to ISO-8859-1
> >
> > This seems to mean that if I use UTF-8 characters (unescaped) from the
> > shell to set my begin or end row, that I will not get what I expect
> because
> > the conversion from String to bytes would be performed using the
> incorrect
> > character set.
> >
> > For example, in the following snippet, testIso fails while testUTF
> succeeds
> > (when the encoding of the source file is UTF-8):
> >
> >
> >   @Test
> >
> >   public void testISO() throws Exception {
> >
> >     String s = "本条目是介紹";
> >
> >     String charset = "ISO-8859-1";
> >
> >     Text t = new Text(s.getBytes(charset));
> >
> >     Assert.assertEquals(s, t.toString());
> >
> >   }
> >
> >
> >   @Test
> >
> >   public void testUTF() throws Exception {
> >
> >     String s = "本条目是介紹";
> >
> >     String charset = "UTF-8";
> >
> >     Text t = new Text(s.getBytes(charset));
> >
> >     Assert.assertEquals(s, t.toString());
> >
> >   }
> >
> >
> > Possibly this should be locale dependent behavior? Also, perhaps I'm
> > missing the fact that the Shell is not supposed to support UTF-8
> characters
> > in start and end ranges, and users must escape their strings
> appropriately.
> > (Which would be a bit of a pain).
> >
> >
> > - Drew
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB