|
|
-
Support for char[] and short[] - Java
Tarun Gupta 2012-12-24, 06:30
Hi,
I am new Avro but I did some basic research regarding how do we a support data types like Char arrays and Short arrays while defining the Avro schema. Issue # AVRO-249 sounded somewhat relevant but its about supporting Short using the reflection API.
We are planning to use Avro for a Java based Client Server data exchange use case, note that our data model is expected to have "large arrays" of Short and Char, and performance is our 'key concern'. We can't use a string to store char[], because what we get back is different then what you put in, because of "UTF-16 normalization".
Thanks in Advance. Tarun Gupta
+
Tarun Gupta 2012-12-24, 06:30
-
Re: Support for char[] and short[] - Java
Scott Carey 2013-01-08, 09:08
You can cast both short and char safely to int and back, and use Avro's int type. These will be variable length integer encoded and take 1 to 3 bytes in binary form per short/char. This will be clunky as a user to wrap char[] or short[] into List<Integer> or int[] however. Another option would be to extend the reader to look for special meta-data in the schema that indicates that an array of int is to be interpreted as shorts or chars.
Can you give an example where a char[] converted to utf8 bytes and back results in a loss of data? I was under the impression that UTF-16 surrogate pairs are converted to proper UTF-8 sequences and back to surrogate pairs. Or, are you using char to represent something else, as a two byte unsigned quantity where interpreting as UTF-16 causes the problem?
On 12/23/12 10:30 PM, "Tarun Gupta" <[EMAIL PROTECTED]> wrote:
> Hi, > > I am new Avro but I did some basic research regarding how do we a support data > types like Char arrays and Short arrays while defining the Avro schema. Issue > # AVRO-249 sounded somewhat relevant but its about supporting Short using the > reflection API. > > We are planning to use Avro for a Java based Client Server data exchange use > case, note that our data model is expected to have "large arrays" of Short and > Char, and performance is our 'key concern'. We can't use a string to store > char[], because what we get back is different then what you put in, because of > "UTF-16 normalization". > > Thanks in Advance. > Tarun Gupta
+
Scott Carey 2013-01-08, 09:08
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext