Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Re: Schema mismatch for files with changing avro schemas


Copy link to this message
-
Re: Schema mismatch for files with changing avro schemas
No AvroStorage doesn't currently support projection push-down. Looking at
the Avro integration code though, this seems feasible.

On Thu, Apr 5, 2012 at 11:27 AM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:

> For the Avro people, does AvroStorage support column pruning?
>
> 2012/4/5 Stan Rosenberg <[EMAIL PROTECTED]>
>
> > AFAIK, by default AvroStorage enforces that all input files have
> > exactly the same schema.  I've submitted a patch to improve
> > this somewhat by allowing different input schemas so long as a union
> > schema can be derived; e.g., say schema 1 contains field 'foo' which
> > is not
> > in schema 2, and schema 2 contains 'bar' which is not in schema 1,
> > then the resulting schema will have both fields, etc.
> > (The patch is here: https://issues.apache.org/jira/browse/PIG-2579.)
> >
> > In your case, you seem to have different schemas where the difference
> > is actual in the fields which are never used inside pig.
> > That's an entirely new use case, afaik.  The union schema is one
> > workaround.  However, it might be better to specify these unused
> > fields
> > and preclude them from validation; perhaps running validation only
> > against those fields which are specified in the pig script.
> >
> > Best,
> >
> > stan
> >
> > On Thu, Apr 5, 2012 at 8:58 AM, Philipp <[EMAIL PROTECTED]> wrote:
> > > Hi list,
> > >
> > > if I run pig over several avro files with different schemas I get a
> > schema
> > > mismatch message, even if the schema has only changed marginally in a
> > field
> > > that I'm not even using in that particular pig job.
> > > I'm wondering if it would be possible to resolve the mismatch, eg. as
> > > suggested in:
> > > https://avro.apache.org/docs/current/spec.html#Schema+Resolution
> > >
> > > Regards, Philipp
> > >
> > >
> >
>

--
*Note that I'm no longer using my Yahoo! email address. Please email me at
[EMAIL PROTECTED] going forward.*
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB