Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> encoding problem for ruby client


+
kafka0102 kafka0102 2012-01-04, 11:59
Copy link to this message
-
Re: encoding problem for ruby client
This sounds like the Ruby implementation does not correctly use UTF-8 on
your platform for encoding strings.  It may be a bug, but I am not
knowledgeable enough on the Ruby implementation to know for sure.

The Avro specification states that "a string is encoded as a long followed
by that many bytes of UTF-8 encoded character data."
(http://avro.apache.org/docs/current/spec.html#binary_encode_primitive).
If you think that the Ruby implementation does not adhere to the spec,
please file a bug in JIRA.

Thanks!

-Scott

On 1/4/12 3:59 AM, "kafka0102 kafka0102" <[EMAIL PROTECTED]> wrote:

> Hi.
> I use avro's java and ruby clients. When they comunite, the ruby client always
> encode(decode) the multi-byte chars(utf-8) to latin1. For now, when the data
> is multi-byte chars,I first encode Iconv.conv("UTF8", "LATIN1",data) in the
> ruby client, and then decoded it  Utils.conv(data, "ISO-8859-1","UTF-8"); in
> the java server.It works,but too ugly. I see the avro ruby client using
> StringIO to pack the data, but I cannot find ways to make it support
> multi-byte chars.
> Can anyone help me?
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB