Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # dev >> Setting Charset in getBytes() call.


+
David Medinets 2012-10-28, 21:50
+
Ed Kohlwey 2012-10-28, 22:18
+
William Slacum 2012-10-29, 15:39
+
David Medinets 2012-10-29, 16:00
+
Josh Elser 2012-10-29, 16:21
+
Benson Margulies 2012-10-29, 16:24
+
John Vines 2012-10-29, 16:42
+
Josh Elser 2012-10-29, 16:57
+
David Medinets 2012-10-29, 17:00
+
William Slacum 2012-10-29, 17:13
+
Mike Drob 2012-10-29, 17:16
+
Michael Flester 2012-10-29, 19:14
+
John Vines 2012-10-29, 19:18
+
Benson Margulies 2012-10-29, 20:02
+
David Medinets 2012-10-29, 20:29
+
Michael Flester 2012-10-30, 00:27
+
Josh Elser 2012-10-30, 00:46
+
Benson Margulies 2012-10-30, 00:54
+
Josh Elser 2012-10-30, 01:57
+
John Vines 2012-10-30, 02:08
+
David Medinets 2012-10-30, 02:47
+
Josh Elser 2012-10-30, 22:27
+
David Medinets 2012-10-30, 23:47
+
Josh Elser 2012-10-31, 00:21
+
Benson Margulies 2012-10-31, 00:31
+
William Slacum 2012-10-31, 00:41
+
David Medinets 2012-10-31, 02:29
+
John Vines 2012-10-31, 02:35
+
Christopher Tubbs 2012-10-31, 18:02
+
Marc Parisi 2012-11-02, 12:24
+
Benson Margulies 2012-11-02, 19:56
+
John Vines 2012-11-02, 20:18
+
Christopher Tubbs 2012-11-03, 01:54
+
David Medinets 2012-11-03, 03:34
+
Josh Elser 2012-11-02, 23:34
+
Drew Farris 2012-10-30, 01:22
+
Adam Fuchs 2012-10-30, 20:26
+
Ed Kohlwey 2012-10-30, 01:44
Copy link to this message
-
Re: Setting Charset in getBytes() call.
Also, on the topic of byte arrays - we should do one better than hbase and
go for ByteBuffers. They are more reusable and long-lived buffers can be
allocated outside the heap and take advantage of OS I/O optimizations.

The current reliance on Text is in my opinion the greatest deficit of the
client API- I have been fiddling with creating a new API, similar to the
work Keith did with typo, but instead looking at introducing generic
superclasses to reduce the API profile.
On Oct 29, 2012 9:22 PM, "Drew Farris" <[EMAIL PROTECTED]> wrote:

> I have always wondered if there were cases in the API where users are
> forced to use Text when they would otherwise prefer byte[], e.g: stuffing a
> non utf8 byte[] into a Text object to facilitate storage or sorting. Not
> entirely sure whether Text would complain if this were the case. I suspect
> we should seek to elimimate these if they currently exist.
>
> Speaking strictly of user data, I agree that fundamentally, every operation
> should be based upon byte[]. API methods providing Text and String based
> calls should be convience methods where the conversion of text to/from
> bytes is handled explicitly (not relying on platform default encoding or
> properties) and transparently (doing something sensible when the user
> doesn't care or is unaware of the issues surrounding character encoding).
>
> Regarding utf8, is there a need to support arbitrary character encodings
> when persisting bytes to accumulo? Think byte order for lexical sorting,
> fixed vs variable length, etc. Perhaps it would not be unreasonable to
> support explicitly stating a character encoding on table creation?
>
> Drew
>  On Oct 29, 2012 8:47 PM, "Josh Elser" <[EMAIL PROTECTED]> wrote:
>
> > +1 Mike.
> >
> > 1. It would be hard for me to believe Key/Value are ever handled
> > internally in terms of Strings, but, if such a case does exist, it would
> be
> > extremely prudent to fix.
> >
> > 2. FWIW, the Shell does use ISO-8859-1 as its charset which is referenced
> > by other commands [1,2]. It would be good to double check all of the
> other
> > commands.
> >
> > [1] https://github.com/apache/**accumulo/blob/trunk/core/src/**
> > main/java/org/apache/accumulo/**core/util/shell/Shell.java<
> https://github.com/apache/accumulo/blob/trunk/core/src/main/java/org/apache/accumulo/core/util/shell/Shell.java
> >
> > [2] https://github.com/apache/**accumulo/blob/trunk/core/src/**
> > main/java/org/apache/accumulo/**core/util/shell/commands/**
> > InsertCommand.java<
> https://github.com/apache/accumulo/blob/trunk/core/src/main/java/org/apache/accumulo/core/util/shell/commands/InsertCommand.java
> >
> >
> > On 10/29/2012 8:27 PM, Michael Flester wrote:
> >
> >> I agree with Benson entirely with one caveat. It seems to me that there
> >> might be two categories of things being discussed
> >>
> >>    1. User data (keys and values)
> >>    2. Ancillary things needed for operation of Accumulo (passwords).
> >>
> >> These could well be considered separately. Trying to do anything with
> >> keys and values other than treating them as bytes all of the time
> >> I find quite scary.
> >>
> >> And if this is only being done to satisfy pmd or findbugs, those tools
> >> can be convinced to modify their reporting about this issue.
> >>
> >>
>
+
Eric Newton 2012-10-30, 20:02
+
Marc Parisi 2012-10-30, 22:28
+
Marc Parisi 2012-10-30, 22:31
+
Benson Margulies 2012-10-30, 23:26