Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> avro python numpy

Copy link to this message
avro python numpy
Hello all,
I am using avro with python to send data to our servers. I noticed avro does not support/understand some numpy datatypes, which is too bad since they are often used for handling larger datasets in python. For example, numpy.float32 can be used in arrays/tables to store 32 bit floats, and maps nicely to double in avro, however avro will not accept it. To store a numpy.float32 in avro float I first have to convert it to a python float (which is 64-bit). It would be nice if avro understood these datatypes.

Example in python:

import numpy
import avro.schema
import avro.io
import cStringIO

writer = avro.io.DatumWriter(avro.schema.PrimitiveSchema('float'))
reader = avro.io.DatumReader(avro.schema.PrimitiveSchema('float'))
buff = cStringIO.StringIO()

writer.write(10, avro.io.BinaryEncoder(buff))
writer.write(numpy.int32(10), avro.io.BinaryEncoder(buff))
writer.write(numpy.float64(10), avro.io.BinaryEncoder(buff))

writer.write(numpy.float32(10), avro.io.BinaryEncoder(buff)) # does not work