Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Encoding when using Bytes.toBytes(String)?


+
Steinmaurer Thomas 2011-07-26, 13:37
+
Joey Echeverria 2011-07-26, 16:35
+
Steinmaurer Thomas 2011-07-27, 06:07
Copy link to this message
-
Re: Encoding when using Bytes.toBytes(String)?
Correct.

On Tue, Jul 26, 2011 at 11:07 PM, Steinmaurer Thomas
<[EMAIL PROTECTED]> wrote:
> Hi!
>
> Thanks. So, it isn't a fixed width with 2 bytes in general, but rather
> depends on the characters? If yes, I think this means I don't have to be
> worried about at all?
>
> Thanks,
> Thomas
>
> -----Original Message-----
> From: Joey Echeverria [mailto:[EMAIL PROTECTED]]
> Sent: Dienstag, 26. Juli 2011 18:36
> To: [EMAIL PROTECTED]
> Subject: Re: Encoding when using Bytes.toBytes(String)?
>
> Bytes.toBytes(String) encodes using UTF-8 [1]. If all of your characters
> are ASCII, then you'll use only one byte per character. I think some
> ANSI characters will map to multibyte characters in UTF-8.
>
> -Joey
>
> [1]
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/Bytes.html#
> toBytes(java.lang.String)
>
> On Tue, Jul 26, 2011 at 6:37 AM, Steinmaurer Thomas
> <[EMAIL PROTECTED]> wrote:
>> Hello,
>>
>>
>>
>> we are currently running tests in respect to disk space usage when
>> inserting records into our table. Just want to be sure, if
>> Bytes.toBytes(String) encodes a character with 2 bytes (Unicode)?
>>
>>
>>
>> As we only have ANSI characters in the rowkey (~ 48 characters) and
>> qualifier values, I wonder if we could save disk space by converting
>> stuff to an Ansi-String before sending it to the server?
>>
>>
>>
>> Thanks,
>>
>> Thomas
>>
>>
>>
>>
>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>

--
Joseph Echeverria
Cloudera, Inc.
443.305.9434