-Re: Hadoop serialization DatumReader/Writer
Marshall Bockrath-Vandegr... 2013-05-13, 23:22
Scott Carey <[EMAIL PROTECTED]> writes:
> Making the DatumReader/Writers configurable would be a welcome
> Ideally, much more of what goes on there could be:
> 1. configuration driven
> 2. pre-computed to avoid repeated work during decoding/encoding
> We do some of both already. The trick is to do #1 without impacting
> performance and #2 requires a bigger overhaul.
Which work in particular? In my pass through the AvroSerialization
implementation so far, it looks like each MR task would create either
one or two Serializers/Deserializers (key and value), each of which in
turn would create one DatumWriter/DatumReader and Encoder/Decoder pair.
Or do De/Serializers get created multiple times per task?
> If you would like, a contribution including a Clojure related maven
> module or two that depends on the Java stuff would be a welcome
> addition and allow us to identify compatibility issues as we change
> the Java library over time.
That sounds like a great end-goal. Right now at the company I work for
(Damballa) we've just started getting our toes wet with Avro. Avro won
our serialization-format bake-off, but we haven't started actually using
it. I just finished an initial pass at Avro-Clojure integration and we
have released it under an open source license:
I would very much like to eventually get a iteration of it into Avro
proper, but I want to actually start using it and Avro first, so we can
hammer out any interface issues etc.
Anyway, I'll try to work up a patch to add some more configuration hooks
to the AvroSerialization. Should I also create a ticket in the Avro