Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - Writing in Cpp and reading in Python


Copy link to this message
-
Writing in Cpp and reading in Python
Gaurav Nanda 2012-05-14, 21:11
Hi,

I am using following schema to write in C++ and reading in python.

{
"type": "record",
"name": "jok_obj",
"fields" : [
            {"name" : "val", "type": ["null", "boolean", "long", "int",
                                      "double", "float", "string",

                                      {"name" : "date", "type" : "record",
                                       "fields" : [
                                                   {"name" : "value",
"type" : "int"}
                                                  ]
                                      },

                                      {"name" : "datetime", "type" : "record",
                                       "fields" : [
                                                   {"name" : "date",
"type" : "int"},
                                                   {"name" : "tics",
"type" : "int"}
                                                  ]
                                      },

                                      {"name" : "timestamp", "type" : "record",
                                       "fields" : [
                                                   {"name" : "sec",
  "type" : "long"},
                                                   {"name" :
"microsec", "type" : "long"}
                                                  ]
                                      },

                                      {"type" : "map",   "values" : "jok_obj"},
                                      {"type" : "array", "items" : "jok_obj"}
                                      ]
            }
        ]
}

I encode C++ object to a memoryInputStream and read it using
StreamReader and convert it ultimately to std::string. Further I try
to decode that string in C++, it works fine, but fails in python.
====    std::string AvroObj::encode()
    {
        std::auto_ptr<avro::OutputStream> out = avro::memoryOutputStream();
        avro::EncoderPtr e = avro::binaryEncoder();
        e->init(*out);
        avro::encode(*e, obj);

        std::auto_ptr<avro::InputStream> in = avro::memoryInputStream(*out);
        avro::StreamReader* reader = new avro::StreamReader(*in);

        std::stringstream ss;
        while(reader->hasMore()) {
            ss << reader->read();
        }

        return ss.str();
    }

====
I am trying to encode {"val" : 0.0}, which in encoded form results to
"\x08". But when I send this to python it fails saying:

=============================================...
File "/u/nanda/jok/lib/python/*****/jok/rpc.py", line 451, in to_avro
    record = dr.read(decoder)
  File "/u/nanda/avro-src-1.6.1/lang/py/src/avro/io.py", line 445, in read
    return self.read_data(self.writers_schema, self.readers_schema, decoder)
  File "/u/nanda/avro-src-1.6.1/lang/py/src/avro/io.py", line 490, in read_data
    return self.read_record(writers_schema, readers_schema, decoder)
  File "/u/nanda/avro-src-1.6.1/lang/py/src/avro/io.py", line 690, in
read_record
    field_val = self.read_data(field.type, readers_field.type, decoder)
  File "/u/nanda/avro-src-1.6.1/lang/py/src/avro/io.py", line 488, in read_data
    return self.read_union(writers_schema, readers_schema, decoder)
  File "/u/nanda/avro-src-1.6.1/lang/py/src/avro/io.py", line 654, in read_union
    return self.read_data(selected_writers_schema, readers_schema, decoder)
  File "/u/nanda/avro-src-1.6.1/lang/py/src/avro/io.py", line 458, in read_data
    return self.read_data(writers_schema, s, decoder)
  File "/u/nanda/avro-src-1.6.1/lang/py/src/avro/io.py", line 476, in read_data
    return decoder.read_double()
  File "/u/nanda/avro-src-1.6.1/lang/py/src/avro/io.py", line 218, in
read_double
    ((ord(self.read(1)) & 0xffL) << 48) |
TypeError: ord() expected a character, but string of length 0 found
===============================================
While digging in more I found that python encodes {"val" : 0.0"} as
"\x08\x00\x00\x00\x00\x00\x00\x00\x00". Anything string shorter that
this gives above error.

Could you please suggest?

Thanks,
Gaurav Nanda