I have been using avro with hadoop and oozie for months now and I am very
happy with the results.
The only point I see as a limitation now is that we specify avro schemes in
Since this info is already provided in Mapper/Reducer signatures, I see this
as redundant. The schema is also present in all my serialized files, which
means that the schema is specified in 3 different places.
>From a run point of view, this is a pain, since any schema modification
(let's say a simple optional field added) forces me to update many job
files. This task is very error prone and since we have a large amount of
jobs, it generates a lot of work.
The only solution I see now would be to find/replace in the build script,
but I hope I could find a better solution by providing some generic schemes
to the job file, or find a way to deactivate schema validation in the job.
Any help will be appreciated!