Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Decode without using DataFileReader

Copy link to this message
Re: Decode without using DataFileReader
I do not understand what you're trying to achieve here.

Encoders work at the primitive level - they merely serialize a given data structure (records, unions, for example), and not look at the schema (Notice - you create a record with a schema, not an encoder with a schema). Decoders could do the same and read back primitives, but if they had a schema they'd read back properly packed data structures. Since encoders do not store schema, decoders need it externally.

DataFiles solve this for you by writing the schema itself into the file as a header. The reader loads this schema into the decoder when it attempts to read it back.

On 05-Dec-2011, at 11:43 PM, Gaurav wrote:

>>> it makes no sense for the encoder to store schema for every given record,
> into a stream.
> Agree. Its not even encode/decoders job to store schema.
> While writing data, I noticed that we don't even need DataFileWriter, all it
> needs is GenericDatumWriter, Encoder and any kind of output stream (which
> can also be a file output stream).
> Sample:
> ------------------------------------------------
> private static ByteArrayOutputStream EncodeData() throws IOException {
> // TODO Auto-generated method stub
> Schema schema = createMetaData();
> GenericDatumWriter<GenericData.Record> datum = new
> GenericDatumWriter<GenericData.Record>(schema);
> GenericData.Record inner_record = new
> GenericData.Record(schema.getField("trade").schema());
> inner_record.put("inner_abc", new Long(23490843));
> GenericData.Record record = new GenericData.Record(schema);
> record.put("abc", 1050324);
> record.put("trade", inner_record);
> ByteArrayOutputStream out = new ByteArrayOutputStream();
> BinaryEncoder encoder = ENCODER_FACTORY.binaryEncoder(out, null);
> datum.write(record, encoder);
> encoder.flush();
> out.close();
> return out;
> }
> ------------------------------------------------
> Then why can't I just use back the same output stream to read back metadata
> and data. It should not be the responsibility of stream reader (which in
> this case is served by FileDataReader) to parse out schema.
> Thanks,
> Gaurav Nanda
> --
> View this message in context: http://apache-avro.679487.n3.nabble.com/Decode-without-using-DataFileReader-tp3561722p3562127.html
> Sent from the Avro - Users mailing list archive at Nabble.com.