Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> GenericDatumWriter.write and the Integer type


Copy link to this message
-
GenericDatumWriter.write and the Integer type
Hi, I have a question about the treatment of Integer types (defined as
'int' in the schema) when serializing with GenericDatumWriter.  The
behavior changed in the 1.6 code line.

The change was apparently to address this issue:
https://issues.apache.org/jira/browse/AVRO-249

Here is the diff:
http://svn.apache.org/viewvc/avro/trunk/lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumWriter.java?r1=1078917&r2=1178973&pathrev=1178973&diff_format=h

Instead of casting the datum to Integer in the old code:
case INT:     out.writeInt((Integer)datum);     break;

The new code casts to Number:
case INT:     out.writeInt(((Number)datum).intValue()); break;

If the datum is a Long, Float or Double, the intValue() method truncates
the value, which is a silent loss of information.  I would rather that an
exception is reported, which is what happens in the old code, so the user
is aware that they've attempted to serialize a value that can't be
represented.

I can override GenericDatumWriter.write to address this (essentially revert
to the old code behavior).

But is my reliance on the casting errors, to get cheap validation,
appropriate?  Or would the recommended approach be to use a
ValidatingEncoder instead?

Thanks,
--mark
+
Doug Cutting 2012-06-11, 20:13
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB