Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Globbing several AVRO files with different (extended) schemes


+
Markus Resch 2012-03-20, 09:27
+
Scott Carey 2012-03-20, 21:08
Copy link to this message
-
Re: Globbing several AVRO files with different (extended) schemes
Supporting schema migration is a badly needed feature in AvroStorage.  I'm
not able to add it in the near future.  Anyone else interested?

On Tue, Mar 20, 2012 at 2:08 PM, Scott Carey <[EMAIL PROTECTED]> wrote:

> I'm assuming you are using Pig's AvroStorage function. It appears that it
> does not support schema migration, but it certainly could do so.  A
> collection of avro files can be 'viewed' as if they all are of one schema
> provided they can all resolve to it.  I have several tools that do this
> successfully with MapReduce/Pig/Hive.
>
> The Pig AvroStorage tool is maintained by the Apache Pig project, you will
> need to inquire there in order to get more details.
>
> -Scott
>
>
>
> On 3/20/12 2:27 AM, "Markus Resch" <[EMAIL PROTECTED]> wrote:
>
> >Hi guys,
> >
> >Thanks again for your awesome hint about sqoop.
> >
> >I have another question: The Data I'm working with is stored as AVRO
> >Files in the Hadoop. When I try to glob them everything works just
> >perfectly. But. When I add the schema of a single data file while the
> >others remain everything gets wrecked:
> >
> >"currently we assume all avro files under the same "location"
> >     * share the same schema and will throw exception if not."
> >
> >(e.g. I add a new data field) Expected behavior for me would be: If I'm
> >globbing several files with slightly different schema the result of the
> >LOAD would be either return an intersection of all valid fields that are
> >common to both schemes or the atoms of the missing fields are nulled.
> >
> >How could I handle this properly?
> >
> >Thanks
> >
> >Markus
> >
> >
> >
> >
>
>
>
--
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB