-Re: Schema mismatch for files with changing avro schemas
Bill Graham 2012-04-05, 22:37
No AvroStorage doesn't currently support projection push-down. Looking at
the Avro integration code though, this seems feasible.
On Thu, Apr 5, 2012 at 11:27 AM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:
> For the Avro people, does AvroStorage support column pruning?
> 2012/4/5 Stan Rosenberg <[EMAIL PROTECTED]>
> > AFAIK, by default AvroStorage enforces that all input files have
> > exactly the same schema. I've submitted a patch to improve
> > this somewhat by allowing different input schemas so long as a union
> > schema can be derived; e.g., say schema 1 contains field 'foo' which
> > is not
> > in schema 2, and schema 2 contains 'bar' which is not in schema 1,
> > then the resulting schema will have both fields, etc.
> > (The patch is here: https://issues.apache.org/jira/browse/PIG-2579.)
> > In your case, you seem to have different schemas where the difference
> > is actual in the fields which are never used inside pig.
> > That's an entirely new use case, afaik. The union schema is one
> > workaround. However, it might be better to specify these unused
> > fields
> > and preclude them from validation; perhaps running validation only
> > against those fields which are specified in the pig script.
> > Best,
> > stan
> > On Thu, Apr 5, 2012 at 8:58 AM, Philipp <[EMAIL PROTECTED]> wrote:
> > > Hi list,
> > >
> > > if I run pig over several avro files with different schemas I get a
> > schema
> > > mismatch message, even if the schema has only changed marginally in a
> > field
> > > that I'm not even using in that particular pig job.
> > > I'm wondering if it would be possible to resolve the mismatch, eg. as
> > > suggested in:
> > > https://avro.apache.org/docs/current/spec.html#Schema+Resolution
> > >
> > > Regards, Philipp
> > >
> > >
*Note that I'm no longer using my Yahoo! email address. Please email me at
[EMAIL PROTECTED] going forward.*