Tarun Gupta 2012-12-24, 06:30
You can cast both short and char safely to int and back, and use Avro's int
type. These will be variable length integer encoded and take 1 to 3 bytes
in binary form per short/char.
This will be clunky as a user to wrap char or short into List<Integer>
or int however. Another option would be to extend the reader to look for
special meta-data in the schema that indicates that an array of int is to be
interpreted as shorts or chars.
Can you give an example where a char converted to utf8 bytes and back
results in a loss of data? I was under the impression that UTF-16 surrogate
pairs are converted to proper UTF-8 sequences and back to surrogate pairs.
Or, are you using char to represent something else, as a two byte unsigned
quantity where interpreting as UTF-16 causes the problem?
On 12/23/12 10:30 PM, "Tarun Gupta" <[EMAIL PROTECTED]> wrote:
> I am new Avro but I did some basic research regarding how do we a support data
> types like Char arrays and Short arrays while defining the Avro schema. Issue
> # AVRO-249 sounded somewhat relevant but its about supporting Short using the
> reflection API.
> We are planning to use Avro for a Java based Client Server data exchange use
> case, note that our data model is expected to have "large arrays" of Short and
> Char, and performance is our 'key concern'. We can't use a string to store
> char, because what we get back is different then what you put in, because of
> "UTF-16 normalization".
> Thanks in Advance.
> Tarun Gupta