Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> abuse of aliases?


+
Koert Kuipers 2012-02-03, 04:03
+
Doug Cutting 2012-02-03, 20:58
Copy link to this message
-
Re: abuse of aliases?
thanks doug

On Fri, Feb 3, 2012 at 3:58 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:

> On 02/02/2012 08:03 PM, Koert Kuipers wrote:
> > i have many avro files with similar data (same meaning, same type, etc.)
> > but different names for the fields.
> > can i create a reader schema that for each field that i am interested in
> > maps it to all the different possible fields in the files by using
> > aliases, and then run map-reduce over the files using this schema?
> > i am talking about tens of aliases per field, and this number will only
> > grow as more data comes in.
> > is this acceptible use of the alias concept, or is it abuse?
>
> This seems like a reasonable use of aliases to me.  Note that aliases
> are limited to elements at the same level of nesting and cannot perform
> arbitrary structural manipulations.  But beyond that, they're meant to
> be a general-purpose mechanism for mapping data from one schema to another.
>
> > and is the
> > alias implementation in avro efficient for such usage?
>
> They should be efficient.  Aliases are implemented by rewriting the old
> schema to have the new names prior to reading.  The rewriting is
> performed once and cached so performance should not be impacted.
>
> Doug
>