|
|
-
Nesting avro with avro or proto binary represenations
Shirahatti, Nikhil 2012-02-16, 04:23
Hello Avro Users,
My question is whether we can use an avro schema as a wrapper for another avro/protobuf binary representation.
Example:
{
"namespace": "com.AvroExample",
"name": "wrapper",
"type": "record",
"fields": [
{"name": "timestamp", "type": "long"},
{"name": "header", "type": "string"},
{"name": "body", "type": "bytes"} ]
} Then the body can be filled in with the binary representation (avro/protobuf/json). Can we wrap the below avro schema being inside the above wrapper schema? If so any pointers for it?
{
"namespace": "com.AvroExample",
"name": "server",
"type": "record",
"fields": [
{ "name" : "status", "type": "string"},
{ "name" : "user", "type": "string"}]
} Thanks,
Nikhil
-
Re: Nesting avro with avro or proto binary represenations
Doug Cutting 2012-02-16, 15:55
On 02/15/2012 08:23 PM, Shirahatti, Nikhil wrote: > My question is whether we can use an avro schema as a wrapper for > another avro/protobuf binary representation.
Yes, that can certainly be done.
A case I've heard where something like this might be useful is query results. For example one might have a results schema like:
{"type": "record", "name":"results", "fields": [ {"name":"schema", "type":"string"}, {"name":"values", "type": {"type":"array", "Items":"bytes"}} ]}
For a query that contains the equivalent of 'SELECT (DATE, ID)' the value of the "schema" field in the results might then be something like:
{"type": "record", "name":"result", "fields": [ {"name":"date", "type":"long"}, {"name":"id", "type": "int"} ]}
Doug
-
Re: Nesting avro with avro or proto binary represenations
Scott Carey 2012-02-16, 20:06
On 2/15/12 8:23 PM, "Shirahatti, Nikhil" <[EMAIL PROTECTED]> wrote:
> Hello Avro Users, > > My question is whether we can use an avro schema as a wrapper for another > avro/protobuf binary representation. > > Example: > { > > "namespace": "com.AvroExample", > > "name": "wrapper", > > "type": "record", > > "fields": [ > > {"name": "timestamp", "type": "long"}, > > {"name": "header", "type": "string"}, > > {"name": "body", "type": "bytes"} ] > > } If you wish to save space, you could use an Enum for the header provided it was only indicating what type the binary is.
Or, you could go even further and use a record for each type, and the wrapper would be a timestamp and a union of the binary types.
> > > Then the body can be filled in with the binary representation > (avro/protobuf/json). Can we wrap the below avro schema being inside the above > wrapper schema? If so any pointers for it? > > { > > "namespace": "com.AvroExample", > > "name": "server", > > "type": "record", > > "fields": [ > > { "name" : "status", "type": "string"}, > > { "name" : "user", "type": "string"}] > > } Since you want to have the internal binary be wrapped and optionally be json, avro, or protobuf, you will probably have code that looks something like the below pseudo-code:
DatumReader wrapperReader = <create a datum reader with your chosen api (specific, generic, reflect if Java) to be cached and used to read the wrapper> Wrapper wrapper = wrapperReader.read(<from the input>); InnerReader inner = getReaderFor(wrapper.getHeader()); // extracts the type from the wrapper and figures out if it is avro, protobuf, etc. This could be based on a string or enum. inner.read(wrapper.getBody()); // passes the body to the inner reader
The write would be similar.
If you used the enum approach, then on the read and write avro would take care of determining what type the body is, but you would still need to have separate implementations for reading the body. > > > Thanks, > > Nikhil
-
Re: Nesting avro with avro or proto binary represenations
Shirahatti, Nikhil 2012-02-16, 21:08
Thanks Doug and Scott. I think this answers my question if it can be done. Now, Is there a template or pattern as to how to do it ? I see two strategies as discussed below:
1. Array of items -> items come from the body structure 2. bytes: a serialization of the body based on a type of serialization
Nikhil
From: Scott Carey <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Date: Thu, 16 Feb 2012 12:06:51 -0800 To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Subject: Re: Nesting avro with avro or proto binary represenations
On 2/15/12 8:23 PM, "Shirahatti, Nikhil" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Hello Avro Users,
My question is whether we can use an avro schema as a wrapper for another avro/protobuf binary representation.
Example:
{
"namespace": "com.AvroExample",
"name": "wrapper",
"type": "record",
"fields": [
{"name": "timestamp", "type": "long"},
{"name": "header", "type": "string"},
{"name": "body", "type": "bytes"} ]
}
If you wish to save space, you could use an Enum for the header provided it was only indicating what type the binary is.
Or, you could go even further and use a record for each type, and the wrapper would be a timestamp and a union of the binary types.
Then the body can be filled in with the binary representation (avro/protobuf/json). Can we wrap the below avro schema being inside the above wrapper schema? If so any pointers for it?
{
"namespace": "com.AvroExample",
"name": "server",
"type": "record",
"fields": [
{ "name" : "status", "type": "string"},
{ "name" : "user", "type": "string"}]
}
Since you want to have the internal binary be wrapped and optionally be json, avro, or protobuf, you will probably have code that looks something like the below pseudo-code:
DatumReader wrapperReader = <create a datum reader with your chosen api (specific, generic, reflect if Java) to be cached and used to read the wrapper> Wrapper wrapper = wrapperReader.read(<from the input>); InnerReader inner = getReaderFor(wrapper.getHeader()); // extracts the type from the wrapper and figures out if it is avro, protobuf, etc. This could be based on a string or enum. inner.read(wrapper.getBody()); // passes the body to the inner reader
The write would be similar.
If you used the enum approach, then on the read and write avro would take care of determining what type the body is, but you would still need to have separate implementations for reading the body. Thanks,
Nikhil
|
|