Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Writing in Cpp and reading in Python


Copy link to this message
-
Re: Writing in Cpp and reading in Python
Python indeed has reached end of file as it was expecting
"\x08\x00\x00\x00\x00\x00\x00\x00\x00" but received only "\x08", so it
complains.

I conjecture the Cpp snippet I have shared is buggy somewhere. I think
I am not extracting the entire encoded string.
What is the suggestive way to pass encoded data from Cpp over the wire?

On Tue, May 15, 2012 at 11:37 PM, Miki Tebeka <[EMAIL PROTECTED]> wrote:
> Can you place an example avro file somewhere so I can take a look?
>
> From the error, it looks like Python has reached end of file and
> read(1) returned an empty string.
>
>
> On Mon, May 14, 2012 at 2:11 PM, Gaurav Nanda <[EMAIL PROTECTED]> wrote:
>> Hi,
>>
>> I am using following schema to write in C++ and reading in python.
>>
>> {
>> "type": "record",
>> "name": "jok_obj",
>> "fields" : [
>>            {"name" : "val", "type": ["null", "boolean", "long", "int",
>>                                      "double", "float", "string",
>>
>>                                      {"name" : "date", "type" : "record",
>>                                       "fields" : [
>>                                                   {"name" : "value",
>> "type" : "int"}
>>                                                  ]
>>                                      },
>>
>>                                      {"name" : "datetime", "type" : "record",
>>                                       "fields" : [
>>                                                   {"name" : "date",
>> "type" : "int"},
>>                                                   {"name" : "tics",
>> "type" : "int"}
>>                                                  ]
>>                                      },
>>
>>                                      {"name" : "timestamp", "type" : "record",
>>                                       "fields" : [
>>                                                   {"name" : "sec",
>>  "type" : "long"},
>>                                                   {"name" :
>> "microsec", "type" : "long"}
>>                                                  ]
>>                                      },
>>
>>                                      {"type" : "map",   "values" : "jok_obj"},
>>                                      {"type" : "array", "items" : "jok_obj"}
>>                                      ]
>>            }
>>        ]
>> }
>>
>> I encode C++ object to a memoryInputStream and read it using
>> StreamReader and convert it ultimately to std::string. Further I try
>> to decode that string in C++, it works fine, but fails in python.
>> ====>>    std::string AvroObj::encode()
>>    {
>>        std::auto_ptr<avro::OutputStream> out = avro::memoryOutputStream();
>>        avro::EncoderPtr e = avro::binaryEncoder();
>>        e->init(*out);
>>        avro::encode(*e, obj);
>>
>>        std::auto_ptr<avro::InputStream> in = avro::memoryInputStream(*out);
>>        avro::StreamReader* reader = new avro::StreamReader(*in);
>>
>>        std::stringstream ss;
>>        while(reader->hasMore()) {
>>            ss << reader->read();
>>        }
>>
>>        return ss.str();
>>    }
>>
>> ====>>
>> I am trying to encode {"val" : 0.0}, which in encoded form results to
>> "\x08". But when I send this to python it fails saying:
>>
>> =============================================>> ...
>> File "/u/nanda/jok/lib/python/*****/jok/rpc.py", line 451, in to_avro
>>    record = dr.read(decoder)
>>  File "/u/nanda/avro-src-1.6.1/lang/py/src/avro/io.py", line 445, in read
>>    return self.read_data(self.writers_schema, self.readers_schema, decoder)
>>  File "/u/nanda/avro-src-1.6.1/lang/py/src/avro/io.py", line 490, in read_data
>>    return self.read_record(writers_schema, readers_schema, decoder)
>>  File "/u/nanda/avro-src-1.6.1/lang/py/src/avro/io.py", line 690, in
>> read_record
>>    field_val = self.read_data(field.type, readers_field.type, decoder)
>>  File "/u/nanda/avro-src-1.6.1/lang/py/src/avro/io.py", line 488, in read_data
>>    return self.read_union(writers_schema, readers_schema, decoder)