Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - GenericDatumWriter.write and the Integer type

Copy link to this message
GenericDatumWriter.write and the Integer type
Mark Hayes 2012-06-09, 15:58
Hi, I have a question about the treatment of Integer types (defined as
'int' in the schema) when serializing with GenericDatumWriter.  The
behavior changed in the 1.6 code line.

The change was apparently to address this issue:

Here is the diff:

Instead of casting the datum to Integer in the old code:
case INT:     out.writeInt((Integer)datum);     break;

The new code casts to Number:
case INT:     out.writeInt(((Number)datum).intValue()); break;

If the datum is a Long, Float or Double, the intValue() method truncates
the value, which is a silent loss of information.  I would rather that an
exception is reported, which is what happens in the old code, so the user
is aware that they've attempted to serialize a value that can't be

I can override GenericDatumWriter.write to address this (essentially revert
to the old code behavior).

But is my reliance on the casting errors, to get cheap validation,
appropriate?  Or would the recommended approach be to use a
ValidatingEncoder instead?