Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # dev - clarifications on file format


Copy link to this message
-
Re: clarifications on file format
Jeff Hammerbacher 2010-04-01, 17:05
> The map of metadata key/value pairs begins with a long, then a number of
> string-key/bytes-value pairs.  To be consistent with avro maps, should this
> be followed by a long of 0?  The spec doesn't say explicitly, but if the
> header is described by an avro schema I would suspect yes.
>

Not sure if this is what you are talking about, but in the Python
implementation (datafile.py) we define an Avro schema for the header:

"""

ETA_SCHEMA schema.parse("""\

{"type": "record", "name":
"org.apache.avro.file.Header",

 "fields" :
[

   {"name": "magic", "type": {"type": "fixed", "name": "magic", "size":
%d}},

   {"name": "meta", "type": {"type": "map", "values":
"bytes"}},

   {"name": "sync", "type": {"type": "fixed", "name": "sync", "size":
%d}}]}

""" % (MAGIC_SIZE, SYNC_SIZE))

"""

Also, some written container files should show up in
https://issues.apache.org/jira/browse/AVRO-230 real soon now.

Thanks,
Jeff