Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> avro python numpy


Copy link to this message
-
avro python numpy
Hello all,
I am using avro with python to send data to our servers. I noticed avro does not support/understand some numpy datatypes, which is too bad since they are often used for handling larger datasets in python. For example, numpy.float32 can be used in arrays/tables to store 32 bit floats, and maps nicely to double in avro, however avro will not accept it. To store a numpy.float32 in avro float I first have to convert it to a python float (which is 64-bit). It would be nice if avro understood these datatypes.
Best,
Koert

Example in python:

import numpy
import avro.schema
import avro.io
import cStringIO

writer = avro.io.DatumWriter(avro.schema.PrimitiveSchema('float'))
reader = avro.io.DatumReader(avro.schema.PrimitiveSchema('float'))
buff = cStringIO.StringIO()

writer.write(10, avro.io.BinaryEncoder(buff))
writer.write(numpy.int32(10), avro.io.BinaryEncoder(buff))
writer.write(numpy.float64(10), avro.io.BinaryEncoder(buff))

writer.write(numpy.float32(10), avro.io.BinaryEncoder(buff)) # does not work
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB