Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> How to deserialize avro file with union/many schemas?


Copy link to this message
-
Re: How to deserialize avro file with union/many schemas?
Echo,

You will need to provide us with some code to be able to help.
If you can share the code that is writing this svro file and the code which is reading it, we could help more.

On a related note, the union schema is not an alien concept in Avro.
If Avro wrote unions, it should be able to read it as well.

Thx – Sachin

From: Echo <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Thursday, July 24, 2014 at 7:23 AM
To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Subject: RE: How to deserialize avro file with union/many schemas?

Hi Sachin

I didn't write the code to create the schema, I just need to use the svro file. The avro library can't read the file with that 'union' schema, so I wonder:

- what's the right way to define the union schema so avro lib can deserislize?

- if I have to read the avro file schema defined like described in my last email, I guess I have to write code to parse it, any idea how it can be done?

Thanks
________________________________
From: Sachin Goyal<mailto:[EMAIL PROTECTED]>
Sent: 7/23/2014 8:43 PM
To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Subject: Re: How to deserialize avro file with union/many schemas?
Hi Echo,

Can you share the code that you used to create the below schema?
How are you appending the schemas into one object?
And how is the data being appended to the same object?

Wouldn’t it be simpler to segregate the objects for different schemas such that
one group of objects contains only one schema and its related data objects?

-Sachin

From: Echo Li <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]>>
Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]>>
Date: Wednesday, July 23, 2014 at 7:50 PM
To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]>>
Subject: Re: How to deserialize avro file with union/many schemas?

thanks Sachin,

My schema more like:
[ { schema-one with type="record"}{schema-two with type="record"}...]

and followed by datums and each pertaining to one of the schemas, and each schema will map to one class.
On Wed, Jul 23, 2014 at 3:42 PM, Sachin Goyal <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]>> wrote:

To see a union schema, do the following:
System.out.println (ReflectData.AllowNull.get().getSchema(YourClass.class));

And then do the following:
System.out.println (ReflectData.get().getSchema(YourClass.class));

Diff the two outputs.
First one generates a UNION of each and every field with a null.

Hope that helps.
Sachin
From: Echo Li <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>>
Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>>
Date: Wednesday, July 23, 2014 at 3:09 PM
To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>>
Subject: Re: How to deserialize avro file with union/many schemas?

Hi Mike,

I read through most of the doc on avro site, don't see anything about the "union schema", Mike, would you mind give me some example here how the union schma is defined? also what package/method can retrieve the master schema from avro file? is that "getschema()" should work? and how to read in each Avro datums whithout knowing their corresponding schema?....

very much appreciate your help!
On Tue, Jul 22, 2014 at 10:25 PM, Michael Pigott <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>> wrote:

It's just a regular Union :-) http://avro.apache.org/docs/1.7.6/spec.html#Unions

Regards,
Mike

On Jul 23, 2014 1:22 AM, "Echo" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>> wrote:
Thanks Mike, it sounds make sense, is there any doc I can read about union schema?

On Jul 22, 2014, at 2:32 PM, Michael Pigott <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>> wrote:

Echo,
    Just to make sure I understand you correctly - do you have a file with multiple Avro datums in it, each one following a separate schema?  And are all of these schemas unioned together in a file-level "master schema?"  (As far as I know, Avro file readers and writers only support one schema per file, so this is the only way your question makes sense to me.)
    If that's the case, then you can get the file's "master schema" and determine what all of the different types are:

List<Schema> allTypes = masterSchema.getTypes(); // Assumes masterSchema is of Type.UNION

Then when you read each Avro datum in the file, you can check which of the schemas it conforms to, and write a new file with just that sub-schema and the one datum in it.

Does that make sense?
Mike
On Tue, Jul 22, 2014 at 3:22 PM, Lewis John Mcgibbney <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>> wrote:
For the