Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # dev >> Setting Charset in getBytes() call.


+
David Medinets 2012-10-28, 21:50
+
Ed Kohlwey 2012-10-28, 22:18
+
William Slacum 2012-10-29, 15:39
+
David Medinets 2012-10-29, 16:00
+
Josh Elser 2012-10-29, 16:21
+
Benson Margulies 2012-10-29, 16:24
+
John Vines 2012-10-29, 16:42
+
Josh Elser 2012-10-29, 16:57
+
David Medinets 2012-10-29, 17:00
+
William Slacum 2012-10-29, 17:13
+
Mike Drob 2012-10-29, 17:16
+
Michael Flester 2012-10-29, 19:14
+
John Vines 2012-10-29, 19:18
+
Benson Margulies 2012-10-29, 20:02
+
David Medinets 2012-10-29, 20:29
+
Michael Flester 2012-10-30, 00:27
+
Josh Elser 2012-10-30, 00:46
+
Benson Margulies 2012-10-30, 00:54
+
Josh Elser 2012-10-30, 01:57
+
John Vines 2012-10-30, 02:08
+
David Medinets 2012-10-30, 02:47
+
Josh Elser 2012-10-30, 22:27
+
David Medinets 2012-10-30, 23:47
+
Josh Elser 2012-10-31, 00:21
+
Benson Margulies 2012-10-31, 00:31
Copy link to this message
-
Re: Setting Charset in getBytes() call.
Accumulo may not be just a set of servers, but it is designed to be a set
of processes, which means having their own JVM. I think this mostly boils
down to an issue of API however-- if Accumulo deals with user's data in
terms of bytes, then this issue is put back on the user, which I'm fine
with as a trade off between configuration versus convention.

There are other cases beyond simply a client API, though, namely
configuration. I'm more comfortable with enforcing some standard there.

On Tue, Oct 30, 2012 at 8:31 PM, Benson Margulies <[EMAIL PROTECTED]>wrote:

> On Tue, Oct 30, 2012 at 8:21 PM, Josh Elser <[EMAIL PROTECTED]> wrote:
> > On 10/30/2012 7:47 PM, David Medinets wrote:
> >>>
> >>> My issue with this is that you have now hard-coded the fact that
> everyone
> >>> else is going to use UTF-8.
> >>
> >>
> >> Who is everyone else? I agree that I have hard-coded the use of UTF-8.
> >> On the other hand, I've merely codified an existing practice. Thus the
> >> issue is now exposed, the places the convention is used are defined.
> >> Once a consensus is reached, we can implement it with confidence.
> >
> >
> > "Everyone else" is everyone who builds Accumulo since you committed your
> > changes and uses it. Ignoring that, forcing a single charset isn't the
> big
> > issue here (as we've *all* agreed that UTF-8 should not cause any
> > data-correctness issues) so for now I'll just drop it as it's just
> creating
> > confusion.
> >
> > My issue is *how* you implemented the default charset. We already have 3
> > people (Marc, Bill and myself) who have stated that we believe inline
> > charset declaration is not the correct implementation and that using the
> JVM
> > property is the better implementation.
> >
> > I'd encourage others to weigh in to form a complete consensus and shift
> the
> > discussion to that implementation if needed.
> >
> >>
> >>> way to fix the problem. I still contest that setting the desired
> encoding
> >>> (via the appropriate JVM property like Bill Slacum initial suggested)
> is
> >>> the
> >>> proper way to address the issue.
> >>
> >>
> >> It is easy to do both. Create a ByteEncodingInitializer (or somesuch)
> >> class that reads the JVM property and defines a globally used Charset.
> >> The find those utf8 definitions and usages and replace them with the
> >> globally-defined value.
> >
> >
> > Again, by setting the 'file.encoding' JVM parameter, such a class is
> > unnecessary because it should be handled internal to Java. For Oracle/Sun
> > JDK and OpenJDK, setting the "file.encoding" parameter at run time will
> use
> > the provided charset you wanted without actually changing any code.
>
> If Accumulo was only a pile of servers, you could do this. You could
> say that part of the configuration process for the servers is to
> specify the desired encoding to file.encoding, and your shell scripts
> could set UTF-8 by default.
>
> But Accumulo is *not* just a pile of servers. Setting file.encoding
> effects the entire JVM. A webapp that uses Accumulo now would need to
> have the entire servlet container have a particular setting of
> file.encoding. This just does not work in the wild. Even without the
> servlet container issue, a user of Accumulo may need to plug it into
> an existing code base that has other reasons to set file.encoding, and
> will not like it when Accumulo starts to corrupt his or her string
> data.
>
+
David Medinets 2012-10-31, 02:29
+
John Vines 2012-10-31, 02:35
+
Christopher Tubbs 2012-10-31, 18:02
+
Marc Parisi 2012-11-02, 12:24
+
Benson Margulies 2012-11-02, 19:56
+
John Vines 2012-11-02, 20:18
+
Christopher Tubbs 2012-11-03, 01:54
+
David Medinets 2012-11-03, 03:34
+
Josh Elser 2012-11-02, 23:34
+
Drew Farris 2012-10-30, 01:22
+
Adam Fuchs 2012-10-30, 20:26
+
Ed Kohlwey 2012-10-30, 01:44
+
Ed Kohlwey 2012-10-30, 01:54
+
Eric Newton 2012-10-30, 20:02
+
Marc Parisi 2012-10-30, 22:28
+
Marc Parisi 2012-10-30, 22:31
+
Benson Margulies 2012-10-30, 23:26
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB