Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Support for char[] and short[] - Java


Copy link to this message
-
Re: Support for char[] and short[] - Java
You can cast both short and char safely to int and back, and use Avro's int
type.  These will be variable length integer encoded and take 1 to 3 bytes
in binary form per short/char.
This will be clunky as a user to wrap char[] or short[] into List<Integer>
or int[] however.  Another option would be to extend the reader to look for
special meta-data in the schema that indicates that an array of int is to be
interpreted as shorts or chars.

Can you give an example where a char[] converted to utf8 bytes and back
results in a loss of data?  I was under the impression that UTF-16 surrogate
pairs are converted to proper UTF-8 sequences and back to surrogate pairs.
Or, are you using char to represent something else, as a two byte unsigned
quantity where interpreting as UTF-16 causes the problem?

On 12/23/12 10:30 PM, "Tarun Gupta" <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I am new Avro but I did some basic research regarding how do we a support data
> types like Char arrays and Short arrays while defining the Avro schema. Issue
> # AVRO-249 sounded somewhat relevant but its about supporting Short using the
> reflection API.
>
> We are planning to use Avro for a Java based Client Server data exchange use
> case, note that our data model is expected to have "large arrays" of Short and
> Char, and performance is our 'key concern'. We can't use a string to store
> char[], because what we get back is different then what you put in, because of
> "UTF-16 normalization".
>
> Thanks in Advance.
> Tarun Gupta
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB