Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Confusion re. persisting the schema

Copy link to this message
Re: Confusion re. persisting the schema
You are simply writing encoded data with that code. You need to use
o.a.a.file.DataFileWriter to write proper avro datafiles (by appending your
datum to it), which stores schema in its headers among other features.

On Oct 12, 2010 11:29 AM, "Christopher Hunt" <[EMAIL PROTECTED]> wrote:

Hi there,

I've just noticed that when I write out my binary data I don't appear to
have a schema saved with it. I was under the impression that Avro saves
schemas along with the data. Thanks for any clarification.

Here's my schema:

  "name": "FileDependency",
  "type": "record",
  "fields": [
      {"name": "file", "type": "string"},
      {"name": "imports", "type": {
          "type": "array", "items": "string"}

The code to write out my data is as follows (also appreciate any refinement
suggestions as I'm new to Avro):

  InputStream fileDependencySchemaIs = this.getClass()
  Schema fileDependencySchema = Schema.parse(fileDependencySchemaIs);

  GenericDatumWriter<GenericRecord> genericDatumWriter       new GenericDatumWriter<GenericRecord>(fileDependencySchema);
  OutputStream os = new FileOutputStream(new File(workFolder,
  Encoder encoder = new BinaryEncoder(os);
  for (Map.Entry<String, Set<String>> entry : fileDependencies
      .entrySet()) {

    GenericRecord genericRecord = new GenericData.Record(

    genericRecord.put("file", new Utf8(entry.getKey()));

    Set<String> imports = entry.getValue();
    GenericArray<Utf8> genericArray = new GenericData.Array<Utf8>(
    for (String importFile : imports) {
      genericArray.add(new Utf8(importFile));
    genericRecord.put("imports", genericArray);

    genericDatumWriter.write(genericRecord, encoder);

Thanks again.

Kind regards,