Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Schema not getting saved along with Data


Copy link to this message
-
RE: Schema not getting saved along with Data
Its a "must do".
The real requirement is the reader of the serialized records must have *exactly* the schema that was used to write the records. [Note: The reader may also, optionally, specify an different reader's schema that it would like the Avro parser to use to translate the deserialized records into.]
How you arrange for the parser to get the writer's schema varies with your usage. If you happen to use the org.apache.avro.file.DataFileWriter then it will prefix the file with the schema used to write all the records. The corresponding DataFileReader will use the prefixed schema to properly deserialize the records.
If you are putting serialized records into some other store, e.g. a database, and there is a chance that the different records would be written with different schemas (or versions of schemas), then you would want to include an indicator of the writer's schema (e.g. a hash of the writer's schema or a foreign key to a schema's table) along with the record so that at read time you could provide the correct writer's schema to your org.apache.avro.io.DatumReader.

________________________________
From: Sachneet Singh Bains <[EMAIL PROTECTED]>
Sent: Tuesday, March 25, 2014 7:18 AM
To: [EMAIL PROTECTED]
Subject: Schema not getting saved along with Data

Hi,

I am new to AVRO and going through the documentation.
From http://avro.apache.org/docs/1.7.6/gettingstartedjava.html
"Data in Avro is always stored with its corresponding schema"

Does the above line convey a 'explicitly must do' or 'implicitly done' ?
Is it always true even when we write single records to any stream or applies only when  "Object Container Files" are used ?
I tried writing some records to a file using DatumWriter and I see no schema saved along.
Please resolve my confusion.
Thanks,
Sachneet
________________________________
NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB