Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> encoding problem for ruby client


+
kafka0102 kafka0102 2012-01-04, 11:59
Copy link to this message
-
Re: encoding problem for ruby client
This sounds like the Ruby implementation does not correctly use UTF-8 on
your platform for encoding strings.  It may be a bug, but I am not
knowledgeable enough on the Ruby implementation to know for sure.

The Avro specification states that "a string is encoded as a long followed
by that many bytes of UTF-8 encoded character data."
(http://avro.apache.org/docs/current/spec.html#binary_encode_primitive).
If you think that the Ruby implementation does not adhere to the spec,
please file a bug in JIRA.

Thanks!

-Scott

On 1/4/12 3:59 AM, "kafka0102 kafka0102" <[EMAIL PROTECTED]> wrote:

> Hi.
> I use avro's java and ruby clients. When they comunite, the ruby client always
> encode(decode) the multi-byte chars(utf-8) to latin1. For now, when the data
> is multi-byte chars,I first encode Iconv.conv("UTF8", "LATIN1",data) in the
> ruby client, and then decoded it  Utils.conv(data, "ISO-8859-1","UTF-8"); in
> the java server.It works,but too ugly. I see the avro ruby client using
> StringIO to pack the data, but I cannot find ways to make it support
> multi-byte chars.
> Can anyone help me?