Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro, mail # user - Confusion re. persisting the schema


+
Christopher Hunt 2010-10-12, 05:58
Copy link to this message
-
Re: Confusion re. persisting the schema
Harsh J 2010-10-12, 06:04
You are simply writing encoded data with that code. You need to use
o.a.a.file.DataFileWriter to write proper avro datafiles (by appending your
datum to it), which stores schema in its headers among other features.

On Oct 12, 2010 11:29 AM, "Christopher Hunt" <[EMAIL PROTECTED]> wrote:

Hi there,

I've just noticed that when I write out my binary data I don't appear to
have a schema saved with it. I was under the impression that Avro saves
schemas along with the data. Thanks for any clarification.

Here's my schema:

{
  "name": "FileDependency",
  "type": "record",
  "fields": [
      {"name": "file", "type": "string"},
      {"name": "imports", "type": {
          "type": "array", "items": "string"}
      }
    ]
}

The code to write out my data is as follows (also appreciate any refinement
suggestions as I'm new to Avro):

  @Cleanup
  InputStream fileDependencySchemaIs = this.getClass()
      .getResourceAsStream(FILE_DEPENDENCY_GRAPH_SCHEMA_NAME);
  Schema fileDependencySchema = Schema.parse(fileDependencySchemaIs);

  GenericDatumWriter<GenericRecord> genericDatumWriter       new GenericDatumWriter<GenericRecord>(fileDependencySchema);
  @Cleanup
  OutputStream os = new FileOutputStream(new File(workFolder,
      FILE_DEPENDENCY_GRAPH_NAME));
  Encoder encoder = new BinaryEncoder(os);
  for (Map.Entry<String, Set<String>> entry : fileDependencies
      .entrySet()) {

    GenericRecord genericRecord = new GenericData.Record(
    fileDependencySchema);

    genericRecord.put("file", new Utf8(entry.getKey()));

    Set<String> imports = entry.getValue();
    GenericArray<Utf8> genericArray = new GenericData.Array<Utf8>(
        imports.size(),
        Schema.createArray(Schema.create(Type.STRING)));
    for (String importFile : imports) {
      genericArray.add(new Utf8(importFile));
    }
    genericRecord.put("imports", genericArray);

    genericDatumWriter.write(genericRecord, encoder);
  }
  encoder.flush();

Thanks again.

Kind regards,
Christopher
+
Christopher Hunt 2010-10-12, 11:02