-Avro data file support in Hadoop
Rahul Bhattacharjee 2012-08-17, 16:25
I was trying to write a MR job for processing avro data file , which
contains serialized object conforming to a schema.
The schema is something this,
1) My plan was to read the avro data file using AvroMapper.
2)Create custom key (CustomKey) containing [name,surname,age] and custom
value (CustomValue) containing [address,country]
3)Collect Pair of CustomKey an CustomValue.
4)Create AvroReducer , expected key is CustomKey and a iterator of
I am not able to get the reducer to work.Looks like the AvroJob expects the
mapper output key and output value to be of type AvroKey and AvroValue
type.I even tried to wrap my custom key and value using AvroKey and
AvroValue. It didn't help.
>From the exception it looks like, AvroReducer get is GenericData.Record
type of object.
Going further , I wish to do secondary sort based on age and using
[name,surname] for grouping comparator and using a custom comparator.
Not sure the helpers for Avro processing in Hadoop supports secondary sort
, custom writable , custom writable comparable or not.
Any pointers regarding avro data processing using hadoop would greatly be
Please let me know if any more information is required from me.