Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Working with changing schemas (avro) in Pig


Copy link to this message
-
Re: Working with changing schemas (avro) in Pig
There is a patch for Avro to deal with this use case:
https://issues.apache.org/jira/browse/PIG-2579
(See the attached pig example which loads two avro input files with
different schemas.)

Best,

stan

On Wed, Mar 28, 2012 at 4:22 PM, IGZ Nick <[EMAIL PROTECTED]> wrote:
> Hi guys,
>
> I use Pig to process some clickstream data. I need to track a new field, so
> I added a new field to my avro schema, and changed my Pig script
> accordingly. It works fine with the new files (which have that new column)
> but it breaks when I run it on my old files which do not have that column
> in the schema (since avro stores schema in the data files itself). I was
> expecting that Pig will assume the field to be null if that particular
> field does not exist. But now I am having to maintain separate scripts to
> process the old and new files. Is there any workaround this? Because I
> figure I'll have to add new column frequently and I don't want to maintain
> a separate script for each window where the schema is constant.
>
> Thanks,
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB