Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Override input schema in AvroStorage


Copy link to this message
-
Re: Override input schema in AvroStorage
Hi Steven,

The new AvroStorage will let you specify the input schema:
https://issues.apache.org/jira/browse/PIG-3015

In fact, somebody made the same request in a comment of the jira that I am
copying and pasting below:

Furthermore, we occasionally have issues with pig jobs picking the old
> schema when we have a schema update. Manually specifying the schema would
> fix this and give us more flexibility in defining the data we want pig to
> pull from a file.
This jira is work in progress, but hopefully it will be in next major
released.

Thanks,
Cheolsoo

On Sat, Apr 27, 2013 at 3:24 PM, Enns, Steven <[EMAIL PROTECTED]> wrote:

> Resending now that I am subscribed :)
>
> On 4/25/13 4:01 PM, "Enns, Steven" <[EMAIL PROTECTED]> wrote:
>
> >Hi everyone,
> >
> >I would like to override the input schema in AvroStorage to make a pig
> >script robust to schema evolution.  For example, suppose a new field is
> >added to an avro schema with a default value of null.  If the input to a
> >pig script using this field includes both old and new data, AvroStorage
> >will merge the input schemas from the old and new data.  However, if the
> >input includes only old data, the new schema will not be available to
> >AvroStorage and pig will fail to interpret the script with an error such
> >as "projected field [newField] does not exist in schema".  If AvroStorage
> >accepted an input schema, the script would be valid for both the new and
> >old data.  Is there any plan to implement this?
> >
> >Thanks,
> >Steve
> >
>
>