Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> is this a bug?

Copy link to this message
is this a bug?

I am working on an Avro MR job and encountering an issue with AvroReducer<Utf8, GenericRecord, GenericRecord>. The corresponding reduce() routine is implemented in the following way:
public void reduce(Utf8 key, Iterable<GenericRecord> values, AvroCollector<GenericRecord> collector, Reporter reporter) throws IOException {
                                  .                                  .                                  .
       GenericRecord record = null;
       for (GenericRecord value : values) {                                   .                                   .                                   .            record = value;            record.put("rowkey", key);                                   .                                   .                                   .            collector.collect(record);         }}
If I comment out the statement in red in the above code, the reduce function gets called properly with CORRECT key values pairs passed to reduce().  However, if I add the statement in red to the routine, the reduce function is called with WRONG key values pairs, in the sense that key2 paired with values3, instead of values2, when passed to the reduce() routine.  I traced this problem by including Hadoop source code, such as ReduceTask.java, Task.java, and Avro source code, such as HadoopReducer.java, HadoopReducerBase.java, and all the serialization code.  The problem showed up on the second call of the reduce(), but I can not locate the exact place that cause the problem.  My intuition is that this is incurred in either the hadoop iterators after merge sort or Avro deserialization.  Is there anybody can help me on this?  Thanks.
Ey-Chih Chow