On 02/02/2012 08:03 PM, Koert Kuipers wrote:
> i have many avro files with similar data (same meaning, same type, etc.)
> but different names for the fields.
> can i create a reader schema that for each field that i am interested in
> maps it to all the different possible fields in the files by using
> aliases, and then run map-reduce over the files using this schema?
> i am talking about tens of aliases per field, and this number will only
> grow as more data comes in.
> is this acceptible use of the alias concept, or is it abuse?
This seems like a reasonable use of aliases to me. Note that aliases
are limited to elements at the same level of nesting and cannot perform
arbitrary structural manipulations. But beyond that, they're meant to
be a general-purpose mechanism for mapping data from one schema to another.
> and is the
> alias implementation in avro efficient for such usage?
They should be efficient. Aliases are implemented by rewriting the old
schema to have the new names prior to reading. The rewriting is
performed once and cached so performance should not be impacted.