Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Working with changing schemas (avro) in Pig


Copy link to this message
-
Re: Working with changing schemas (avro) in Pig
Bill Graham 2012-03-28, 20:29
AvroStorage supports different modes to load the schema definition. One is
to get it from the Avro record, which would cause problems with evolution,
but you can also specific a schema file. Which are you using? Can you
attach the snippet of your script that initializes AvroStorage?

On Wed, Mar 28, 2012 at 1:22 PM, IGZ Nick <[EMAIL PROTECTED]> wrote:

> Hi guys,
>
> I use Pig to process some clickstream data. I need to track a new field, so
> I added a new field to my avro schema, and changed my Pig script
> accordingly. It works fine with the new files (which have that new column)
> but it breaks when I run it on my old files which do not have that column
> in the schema (since avro stores schema in the data files itself). I was
> expecting that Pig will assume the field to be null if that particular
> field does not exist. But now I am having to maintain separate scripts to
> process the old and new files. Is there any workaround this? Because I
> figure I'll have to add new column frequently and I don't want to maintain
> a separate script for each window where the schema is constant.
>
> Thanks,
>

--
*Note that I'm no longer using my Yahoo! email address. Please email me at
[EMAIL PROTECTED] going forward.*