Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - Nesting avro with avro or proto binary represenations


Copy link to this message
-
Re: Nesting avro with avro or proto binary represenations
Scott Carey 2012-02-16, 20:06


On 2/15/12 8:23 PM, "Shirahatti, Nikhil" <[EMAIL PROTECTED]> wrote:

> Hello Avro Users,
>
> My question is whether we can use an avro schema as a wrapper for another
> avro/protobuf binary representation.
>
> Example:
> {
>
>       "namespace": "com.AvroExample",
>
>       "name": "wrapper",
>
>       "type": "record",
>
>       "fields": [
>
>           {"name": "timestamp", "type": "long"},
>
>           {"name": "header", "type": "string"},
>
>           {"name": "body", "type": "bytes"} ]
>
> }
If you wish to save space, you could use an Enum for the header provided it
was only indicating what type the binary is.

Or, you could go even further and use a record for each type, and the
wrapper would be a timestamp and a union of the binary types.

>
>
> Then the body can be filled in with the binary representation
> (avro/protobuf/json). Can we wrap the below avro schema being inside the above
> wrapper schema? If so any pointers for it?
>
> {
>
>       "namespace": "com.AvroExample",
>
>       "name": "server",
>
>       "type": "record",
>
>       "fields": [
>
>              { "name" : "status", "type": "string"},
>
>              { "name" : "user", "type": "string"}]
>
> }
Since you want to have the internal binary be wrapped and optionally be
json, avro, or protobuf, you will probably have code that looks something
like the below pseudo-code:

DatumReader wrapperReader = <create a datum reader with your chosen api
(specific, generic, reflect if Java) to be cached and used to read the
wrapper>
Wrapper wrapper = wrapperReader.read(<from the input>);
InnerReader inner = getReaderFor(wrapper.getHeader());  // extracts the type
from the wrapper and figures out if it is avro, protobuf, etc.  This could
be based on a string or enum.
inner.read(wrapper.getBody()); // passes the body to the inner reader

The write would be similar.

If you used the enum approach, then on the read and write avro would take
care of determining what type the body is, but you would still need to have
separate implementations for reading the body.
>
>
> Thanks,
>
> Nikhil