Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Globbing several AVRO files with different (extended) schemes

Markus Resch 2012-03-20, 09:27
Copy link to this message
Re: Globbing several AVRO files with different (extended) schemes
I'm assuming you are using Pig's AvroStorage function. It appears that it
does not support schema migration, but it certainly could do so.  A
collection of avro files can be 'viewed' as if they all are of one schema
provided they can all resolve to it.  I have several tools that do this
successfully with MapReduce/Pig/Hive.

The Pig AvroStorage tool is maintained by the Apache Pig project, you will
need to inquire there in order to get more details.


On 3/20/12 2:27 AM, "Markus Resch" <[EMAIL PROTECTED]> wrote:

>Hi guys,
>Thanks again for your awesome hint about sqoop.
>I have another question: The Data I'm working with is stored as AVRO
>Files in the Hadoop. When I try to glob them everything works just
>perfectly. But. When I add the schema of a single data file while the
>others remain everything gets wrecked:
>"currently we assume all avro files under the same "location"
>     * share the same schema and will throw exception if not."
>(e.g. I add a new data field) Expected behavior for me would be: If I'm
>globbing several files with slightly different schema the result of the
>LOAD would be either return an intersection of all valid fields that are
>common to both schemes or the atoms of the missing fields are nulled.
>How could I handle this properly?
Russell Jurney 2012-03-20, 21:53