Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Working with changing schemas (avro) in Pig


Copy link to this message
-
Re: Working with changing schemas (avro) in Pig
Stan Rosenberg 2012-03-28, 22:00
There is a patch for Avro to deal with this use case:
https://issues.apache.org/jira/browse/PIG-2579
(See the attached pig example which loads two avro input files with
different schemas.)

Best,

stan

On Wed, Mar 28, 2012 at 4:22 PM, IGZ Nick <[EMAIL PROTECTED]> wrote:
> Hi guys,
>
> I use Pig to process some clickstream data. I need to track a new field, so
> I added a new field to my avro schema, and changed my Pig script
> accordingly. It works fine with the new files (which have that new column)
> but it breaks when I run it on my old files which do not have that column
> in the schema (since avro stores schema in the data files itself). I was
> expecting that Pig will assume the field to be null if that particular
> field does not exist. But now I am having to maintain separate scripts to
> process the old and new files. Is there any workaround this? Because I
> figure I'll have to add new column frequently and I don't want to maintain
> a separate script for each window where the schema is constant.
>
> Thanks,