Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Hadoop serialization DatumReader/Writer


Copy link to this message
-
Re: Hadoop serialization DatumReader/Writer
Scott Carey <[EMAIL PROTECTED]> writes:

> Making the DatumReader/Writers configurable would be a welcome
> addition.

Excellent!

> Ideally, much more of what goes on there could be:
>  1. configuration driven
>  2. pre-computed to avoid repeated work during decoding/encoding
>
> We do some of both already.  The trick is to do #1 without impacting
> performance and #2 requires a bigger overhaul.

Which work in particular?  In my pass through the AvroSerialization
implementation so far, it looks like each MR task would create either
one or two Serializers/Deserializers (key and value), each of which in
turn would create one DatumWriter/DatumReader and Encoder/Decoder pair.
Or do De/Serializers get created multiple times per task?

> If you would like, a contribution including a Clojure related maven
> module or two that depends on the Java stuff would be a welcome
> addition and allow us to identify compatibility issues as we change
> the Java library over time.

That sounds like a great end-goal.  Right now at the company I work for
(Damballa) we've just started getting our toes wet with Avro.  Avro won
our serialization-format bake-off, but we haven't started actually using
it.  I just finished an initial pass at Avro-Clojure integration and we
have released it under an open source license:

    https://github.com/damballa/abracad

I would very much like to eventually get a iteration of it into Avro
proper, but I want to actually start using it and Avro first, so we can
hammer out any interface issues etc.

Anyway, I'll try to work up a patch to add some more configuration hooks
to the AvroSerialization.  Should I also create a ticket in the Avro
issue tracker?

-Marshall
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB