Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - Support for char[] and short[] - Java


Copy link to this message
-
Re: Support for char[] and short[] - Java
Scott Carey 2013-01-08, 09:08
You can cast both short and char safely to int and back, and use Avro's int
type.  These will be variable length integer encoded and take 1 to 3 bytes
in binary form per short/char.
This will be clunky as a user to wrap char[] or short[] into List<Integer>
or int[] however.  Another option would be to extend the reader to look for
special meta-data in the schema that indicates that an array of int is to be
interpreted as shorts or chars.

Can you give an example where a char[] converted to utf8 bytes and back
results in a loss of data?  I was under the impression that UTF-16 surrogate
pairs are converted to proper UTF-8 sequences and back to surrogate pairs.
Or, are you using char to represent something else, as a two byte unsigned
quantity where interpreting as UTF-16 causes the problem?

On 12/23/12 10:30 PM, "Tarun Gupta" <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I am new Avro but I did some basic research regarding how do we a support data
> types like Char arrays and Short arrays while defining the Avro schema. Issue
> # AVRO-249 sounded somewhat relevant but its about supporting Short using the
> reflection API.
>
> We are planning to use Avro for a Java based Client Server data exchange use
> case, note that our data model is expected to have "large arrays" of Short and
> Char, and performance is our 'key concern'. We can't use a string to store
> char[], because what we get back is different then what you put in, because of
> "UTF-16 normalization".
>
> Thanks in Advance.
> Tarun Gupta