Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Specific/GenericDatumReader performance and resolving decoders

Copy link to this message
Re: Specific/GenericDatumReader performance and resolving decoders
I think this approach makes sense, reader=writer is common.  In addition to
record fields, unions are affected.

I have been thinking about the issue that resolving records is slower than
not for a while.  In theory, it could be just as fast because you can
pre-compute the steps needed and bake that into the reading logic.  This
seems like a reasonable way to avoid the cost for the case where schemas

Please open a JIRA ticket and put your preliminary thoughts there.  It is a
good place to discuss the technical bits of the issue even before you have a

On 4/19/12 2:09 AM, "Irving, Dave" <[EMAIL PROTECTED]> wrote:

> Hi,
> Recently I¹ve been looking at the performance of avros
> SpecificDatumReaders/Writers. In our use cases, when deserializing, we find it
> quite usual for reader / writer schemas to be identical. Interestingly,
> GenericDatumReader bakes in the use of ResolvingDecoders right in to its core.
> So even if constructed with a single (reader/writer) schema, a
> ResolvingDecoder is still used.
> I experimented a little, and wrote a SpecificDatumReader which instead of
> being hard wired with a ResolvingDecoder, uses a DecodeStrategy ­ leaving the
> reader only dealing with Decoders directly.
> Details follow ­ but for Œsame schema¹ decodes ­ the performance difference is
> impressive. For the types of records I deal with, a decode with reader schema
> == writer schema using this approach is about 1.6x faster than a standard
> SpecificDatumReader decode.
> interface DecodeStrategy
> {
>   Decoder configureForRead(Decoder in) throws IOException;
>   void readComplete() throws IOException;
>   void decodeRecordFields(Object old, SpecificRecord record, Schema expected,
> Decoder in, SpecificDatumReader2 reader) throws IOException;
> }
> The idea is that when we hit a record, instead of getting field order from a
> ResolvingDecoder directly, we just let the decode strategy do it for us
> (calling back for each field to the reader ­ allowing recursion).
> For e.g. when we know reader / writer schemas are identical, and we don¹t want
> validation ­ an IdentitySchemaDecodeStrategy#decodeRecordFields can just pull
> the fields direct from the provided record schema (calling back on the reader
> for each one):
> ...
> void decodeRecordFields(......)
> {
>   List<Field> fields = expected.getFields();
>   For (int i=0, len = fields.size(); i<len; ++i)
>   {
>     reader.readField(old, in, field, record);
>   }
> }
> ...
> The resolving decoder impl of this strategy just does a ŒreadFieldOrder¹ like
> GenericDatumReader does today.
> For each read (given a Decoder), the datum reader lets the decode strategy
> return back the actual decoder to be used (via #configureForRead). This means
> that a resolving implementation can use this hook to configure the
> ResolvingDecoder and return this.
> The result is that the datum reader can work with same schema / validated
> schema / resolved schemas seamlessly without caring about the difference.
> I thought I¹d share the approach before working on a full patch: Is this an
> approach you¹d be interested in taking back to core avro? Or is it a little
> niche? J
> Cheers,
> Dave
> This message w/attachments (message) is intended solely for the use of the
> intended recipient(s) and may contain information that is privileged,
> confidential or proprietary. If you are not an intended recipient, please
> notify the sender, and then please delete and destroy all copies and
> attachments, and be advised that any review or dissemination of, or the taking
> of any action in reliance on, the information contained in or attached to this
> message is prohibited.
> Unless specifically indicated, this message is not an offer to sell or a
> solicitation of any investment products or other financial product or service,
> an official confirmation of any transaction, or an official statement of
> Sender. Subject to applicable law, Sender may intercept, monitor, review and