Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # dev >> Shell Charset?


+
Drew Farris 2013-05-05, 22:49
Copy link to this message
-
Re: Shell Charset?
The shell should accept java "String" from the the console (leaving
the job of converting input bytes to a java String argument to the
locale-dependent console), and should only translate them to UTF-8
when it sends it to Accumulo, I think.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii
On Sun, May 5, 2013 at 6:49 PM, Drew Farris <[EMAIL PROTECTED]> wrote:
> In o.a.a.core.uti.shell.commands.OptUtil, I notice that getStartRow and
> getEndRow, use the following snippet to read their arguments:
>
> new Text(cl.getOptionValue(END_ROW_OPT).getBytes(Shell.CHARSET));
>
> Here, Shell.CHARSET is set to ISO-8859-1
>
> This seems to mean that if I use UTF-8 characters (unescaped) from the
> shell to set my begin or end row, that I will not get what I expect because
> the conversion from String to bytes would be performed using the incorrect
> character set.
>
> For example, in the following snippet, testIso fails while testUTF succeeds
> (when the encoding of the source file is UTF-8):
>
>
>   @Test
>
>   public void testISO() throws Exception {
>
>     String s = "本条目是介紹";
>
>     String charset = "ISO-8859-1";
>
>     Text t = new Text(s.getBytes(charset));
>
>     Assert.assertEquals(s, t.toString());
>
>   }
>
>
>   @Test
>
>   public void testUTF() throws Exception {
>
>     String s = "本条目是介紹";
>
>     String charset = "UTF-8";
>
>     Text t = new Text(s.getBytes(charset));
>
>     Assert.assertEquals(s, t.toString());
>
>   }
>
>
> Possibly this should be locale dependent behavior? Also, perhaps I'm
> missing the fact that the Shell is not supposed to support UTF-8 characters
> in start and end ranges, and users must escape their strings appropriately.
> (Which would be a bit of a pain).
>
>
> - Drew
+
John Vines 2013-05-06, 14:55
+
Keith Turner 2013-05-06, 18:09
+
Josh Elser 2013-05-06, 18:49
+
Keith Turner 2013-05-06, 19:42
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB