Koert Kuipers 2012-02-03, 04:03
Doug Cutting 2012-02-03, 20:58
On Fri, Feb 3, 2012 at 3:58 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:
> On 02/02/2012 08:03 PM, Koert Kuipers wrote:
> > i have many avro files with similar data (same meaning, same type, etc.)
> > but different names for the fields.
> > can i create a reader schema that for each field that i am interested in
> > maps it to all the different possible fields in the files by using
> > aliases, and then run map-reduce over the files using this schema?
> > i am talking about tens of aliases per field, and this number will only
> > grow as more data comes in.
> > is this acceptible use of the alias concept, or is it abuse?
> This seems like a reasonable use of aliases to me. Note that aliases
> are limited to elements at the same level of nesting and cannot perform
> arbitrary structural manipulations. But beyond that, they're meant to
> be a general-purpose mechanism for mapping data from one schema to another.
> > and is the
> > alias implementation in avro efficient for such usage?
> They should be efficient. Aliases are implemented by rewriting the old
> schema to have the new names prior to reading. The rewriting is
> performed once and cached so performance should not be impacted.