Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Override input schema in AvroStorage

Enns, Steven 2013-04-25, 23:01
Copy link to this message
Re: Override input schema in AvroStorage
Hi Steven,

The new AvroStorage will let you specify the input schema:

In fact, somebody made the same request in a comment of the jira that I am
copying and pasting below:

Furthermore, we occasionally have issues with pig jobs picking the old
> schema when we have a schema update. Manually specifying the schema would
> fix this and give us more flexibility in defining the data we want pig to
> pull from a file.
This jira is work in progress, but hopefully it will be in next major


On Sat, Apr 27, 2013 at 3:24 PM, Enns, Steven <[EMAIL PROTECTED]> wrote:

> Resending now that I am subscribed :)
> On 4/25/13 4:01 PM, "Enns, Steven" <[EMAIL PROTECTED]> wrote:
> >Hi everyone,
> >
> >I would like to override the input schema in AvroStorage to make a pig
> >script robust to schema evolution.  For example, suppose a new field is
> >added to an avro schema with a default value of null.  If the input to a
> >pig script using this field includes both old and new data, AvroStorage
> >will merge the input schemas from the old and new data.  However, if the
> >input includes only old data, the new schema will not be available to
> >AvroStorage and pig will fail to interpret the script with an error such
> >as "projected field [newField] does not exist in schema".  If AvroStorage
> >accepted an input schema, the script would be valid for both the new and
> >old data.  Is there any plan to implement this?
> >
> >Thanks,
> >Steve
> >
Enns, Steven 2013-05-01, 17:29