Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Working with changing schemas (avro) in Pig

Copy link to this message
Re: Working with changing schemas (avro) in Pig
AvroStorage supports different modes to load the schema definition. One is
to get it from the Avro record, which would cause problems with evolution,
but you can also specific a schema file. Which are you using? Can you
attach the snippet of your script that initializes AvroStorage?

On Wed, Mar 28, 2012 at 1:22 PM, IGZ Nick <[EMAIL PROTECTED]> wrote:

> Hi guys,
> I use Pig to process some clickstream data. I need to track a new field, so
> I added a new field to my avro schema, and changed my Pig script
> accordingly. It works fine with the new files (which have that new column)
> but it breaks when I run it on my old files which do not have that column
> in the schema (since avro stores schema in the data files itself). I was
> expecting that Pig will assume the field to be null if that particular
> field does not exist. But now I am having to maintain separate scripts to
> process the old and new files. Is there any workaround this? Because I
> figure I'll have to add new column frequently and I don't want to maintain
> a separate script for each window where the schema is constant.
> Thanks,

*Note that I'm no longer using my Yahoo! email address. Please email me at
[EMAIL PROTECTED] going forward.*