Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Working with changing schemas (avro) in Pig

Copy link to this message
Re: Working with changing schemas (avro) in Pig
There is a patch for Avro to deal with this use case:
(See the attached pig example which loads two avro input files with
different schemas.)



On Wed, Mar 28, 2012 at 4:22 PM, IGZ Nick <[EMAIL PROTECTED]> wrote:
> Hi guys,
> I use Pig to process some clickstream data. I need to track a new field, so
> I added a new field to my avro schema, and changed my Pig script
> accordingly. It works fine with the new files (which have that new column)
> but it breaks when I run it on my old files which do not have that column
> in the schema (since avro stores schema in the data files itself). I was
> expecting that Pig will assume the field to be null if that particular
> field does not exist. But now I am having to maintain separate scripts to
> process the old and new files. Is there any workaround this? Because I
> figure I'll have to add new column frequently and I don't want to maintain
> a separate script for each window where the schema is constant.
> Thanks,