Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # dev >> Setting Charset in getBytes() call.


+
David Medinets 2012-10-28, 21:50
+
Ed Kohlwey 2012-10-28, 22:18
+
William Slacum 2012-10-29, 15:39
+
David Medinets 2012-10-29, 16:00
+
Josh Elser 2012-10-29, 16:21
+
Benson Margulies 2012-10-29, 16:24
+
John Vines 2012-10-29, 16:42
+
Josh Elser 2012-10-29, 16:57
+
David Medinets 2012-10-29, 17:00
+
William Slacum 2012-10-29, 17:13
+
Mike Drob 2012-10-29, 17:16
+
Michael Flester 2012-10-29, 19:14
+
John Vines 2012-10-29, 19:18
+
Benson Margulies 2012-10-29, 20:02
+
David Medinets 2012-10-29, 20:29
+
Michael Flester 2012-10-30, 00:27
+
Josh Elser 2012-10-30, 00:46
+
Benson Margulies 2012-10-30, 00:54
+
Josh Elser 2012-10-30, 01:57
+
John Vines 2012-10-30, 02:08
+
David Medinets 2012-10-30, 02:47
+
Josh Elser 2012-10-30, 22:27
+
David Medinets 2012-10-30, 23:47
+
Josh Elser 2012-10-31, 00:21
+
Benson Margulies 2012-10-31, 00:31
+
William Slacum 2012-10-31, 00:41
+
David Medinets 2012-10-31, 02:29
+
John Vines 2012-10-31, 02:35
+
Christopher Tubbs 2012-10-31, 18:02
+
Marc Parisi 2012-11-02, 12:24
+
Benson Margulies 2012-11-02, 19:56
+
John Vines 2012-11-02, 20:18
+
Christopher Tubbs 2012-11-03, 01:54
+
David Medinets 2012-11-03, 03:34
+
Josh Elser 2012-11-02, 23:34
+
Drew Farris 2012-10-30, 01:22
Copy link to this message
-
Re: Setting Charset in getBytes() call.
On Mon, Oct 29, 2012 at 9:22 PM, Drew Farris <[EMAIL PROTECTED]> wrote:

> I have always wondered if there were cases in the API where users are
> forced to use Text when they would otherwise prefer byte[], e.g: stuffing a
> non utf8 byte[] into a Text object to facilitate storage or sorting. Not
> entirely sure whether Text would complain if this were the case. I suspect
> we should seek to elimimate these if they currently exist.
>

The Text class is essentially a wrapper around a byte[], with some
convenience methods for translating to/from other types. Accumulo only ever
reads bytes out of it, so there is no encoding problem there. We also don't
use most of its convenience methods. Many people see that it is named
"Text" and assume that it only stores human readable text, but that is not
the case. It probably should have been named
"ConvenientByteArrayWrapperWithSomeMemoryEfficiencySupportAndStringOrientedTranslationMethodsThatIsWritableComparable".

I also agree that it would be good to get rid of the reliance on Hadoop's
Text object, especially because people often do not respect getLength() on
the byte[] obtained from getBytes().

Adam
+
Ed Kohlwey 2012-10-30, 01:44
+
Ed Kohlwey 2012-10-30, 01:54
+
Eric Newton 2012-10-30, 20:02
+
Marc Parisi 2012-10-30, 22:28
+
Marc Parisi 2012-10-30, 22:31
+
Benson Margulies 2012-10-30, 23:26