Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Hadoop serialization DatumReader/Writer


Copy link to this message
-
Re: Hadoop serialization DatumReader/Writer
Scott Carey <[EMAIL PROTECTED]> writes:

> Making the DatumReader/Writers configurable would be a welcome
> addition.

Excellent!

> Ideally, much more of what goes on there could be:
>  1. configuration driven
>  2. pre-computed to avoid repeated work during decoding/encoding
>
> We do some of both already.  The trick is to do #1 without impacting
> performance and #2 requires a bigger overhaul.

Which work in particular?  In my pass through the AvroSerialization
implementation so far, it looks like each MR task would create either
one or two Serializers/Deserializers (key and value), each of which in
turn would create one DatumWriter/DatumReader and Encoder/Decoder pair.
Or do De/Serializers get created multiple times per task?

> If you would like, a contribution including a Clojure related maven
> module or two that depends on the Java stuff would be a welcome
> addition and allow us to identify compatibility issues as we change
> the Java library over time.

That sounds like a great end-goal.  Right now at the company I work for
(Damballa) we've just started getting our toes wet with Avro.  Avro won
our serialization-format bake-off, but we haven't started actually using
it.  I just finished an initial pass at Avro-Clojure integration and we
have released it under an open source license:

    https://github.com/damballa/abracad

I would very much like to eventually get a iteration of it into Avro
proper, but I want to actually start using it and Avro first, so we can
hammer out any interface issues etc.

Anyway, I'll try to work up a patch to add some more configuration hooks
to the AvroSerialization.  Should I also create a ticket in the Avro
issue tracker?

-Marshall