Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Schema mismatch for files with changing avro schemas

Copy link to this message
Re: Schema mismatch for files with changing avro schemas
AFAIK, by default AvroStorage enforces that all input files have
exactly the same schema.  I've submitted a patch to improve
this somewhat by allowing different input schemas so long as a union
schema can be derived; e.g., say schema 1 contains field 'foo' which
is not
in schema 2, and schema 2 contains 'bar' which is not in schema 1,
then the resulting schema will have both fields, etc.
(The patch is here: https://issues.apache.org/jira/browse/PIG-2579.)

In your case, you seem to have different schemas where the difference
is actual in the fields which are never used inside pig.
That's an entirely new use case, afaik.  The union schema is one
workaround.  However, it might be better to specify these unused
and preclude them from validation; perhaps running validation only
against those fields which are specified in the pig script.



On Thu, Apr 5, 2012 at 8:58 AM, Philipp <[EMAIL PROTECTED]> wrote:
> Hi list,
> if I run pig over several avro files with different schemas I get a schema
> mismatch message, even if the schema has only changed marginally in a field
> that I'm not even using in that particular pig job.
> I'm wondering if it would be possible to resolve the mismatch, eg. as
> suggested in:
> https://avro.apache.org/docs/current/spec.html#Schema+Resolution
> Regards, Philipp