Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> is this a bug?


Copy link to this message
-
RE: is this a bug?

I did some more investigation.  I found weird behavior in the readString() method of BinaryDecoder.java in Avro source code if we have the statement record.put("rowkey", key) in the reduce() method.  Does this mean that there is a bug in BinaryDecoder.java ?  Thanks.
Ey-Chih Chow

From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: RE: is this a bug?
Date: Fri, 4 Mar 2011 00:48:55 -0800
What follows are fragments of trace logs of our MR jobs corresponding respectively to with and without the statement 'record.put("rowkey", key)' mentioned in the previous messages.  From the last line, logged at the entry of the reduce() method, of each of these two logs you can see the difference.  I.e. for the first segment, the log is 'working on 0000000200000000000000000000000000002 whose rowKey is 0000000300000000000000000000000000003' for the second segment, the log is 'working on 0000000200000000000000000000000000002 whose rowKey is 0000000200000000000000000000000000002',  where the second log is what we expected, corresponding to the correct key values pair passed to the reduce() method.  Note that these two fragments of logs are generated by adding some additional log statements to Hadoop and Avro source code.

Can anybody help to see if this is a bug in Avro or Hadoop code?

=============================================================================================================
log fragment with the statement 'record.put("rowkey", key)

2011-03-03 18:00:00,180 INFO org.apache.hadoop.mapred.ReduceTask: trace bug isSkipping():false
2011-03-03 18:00:00,190 INFO org.apache.avro.mapred.AvroSerialization: trace bug deserialize() reader org.apache.avro.specific.SpecificDatumReader@1a001ff
2011-03-03 18:00:00,198 INFO org.apache.avro.generic.GenericDatumReader: trace bug type of expected STRING
2011-03-03 18:00:00,199 INFO org.apache.avro.mapred.AvroSerialization: trace bug deserialized datum 0000000000000000000000000000000000000
2011-03-03 18:00:00,199 INFO org.apache.hadoop.mapred.TaskRunner: trace bug1 deserializer is org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer@1abcc03
2011-03-03 18:00:00,199 INFO org.apache.hadoop.mapred.TaskRunner: trace bug1 key is 0000000000000000000000000000000000000
2011-03-03 18:00:00,199 INFO org.apache.hadoop.mapred.ReduceTask: trace bug done with set values
2011-03-03 18:00:00,199 INFO org.apache.hadoop.mapred.ReduceTask: trace bug key is 0000000000000000000000000000000000000 values is org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator@1deeb40
2011-03-03 18:00:00,199 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: work on key 0000000000000000000000000000000000000
2011-03-03 18:00:00,199 INFO org.apache.avro.mapred.AvroSerialization: trace bug deserialize() reader org.apache.avro.specific.SpecificDatumReader@26e9f9
2011-03-03 18:00:00,208 INFO org.apache.avro.generic.GenericDatumReader: trace bug type of expected RECORD
2011-03-03 18:00:00,208 INFO org.apache.avro.mapred.AvroSerialization: trace bug deserialized datum {"rowKey": "0000000000000000000000000000000000000", "tableName": null, "Games__": [{"columnName": "0_TESTFAM_TESTSKU_1.5", "columnValue": {"bytes": "ame": "hwty", "columnValue": "stringvalue"}, {"columnName": "loc", "columnValue": "stringvalue"}, {"columnName": "osrev", "columnValue": "stringvalue"}, {"columnName": "tz", "columnValue": "stringvalue"}], "PlayerState__": [{"columnName": "0_TESTFAM_TESTSKU_1.0=GC=2010:01:01:07", "columnValue": "{"mojo":10,"afloat":1.99,"hat":"red"}", "timestamp": 123456789}, {"columnName": "0_TESTFAM_TESTSKU_1.0=GS=2010:01:01:07", "columnValue": "{"mojo":10,"afloat":1.99,"hat":"red"}", "timestamp": 123456799}], "ClientSessions__": null, "ServerSessions__": null, "Monetization__": null}
2011-03-03 18:00:00,208 INFO org.apache.hadoop.mapred.TaskRunner: trace bug1 value is {"rowKey": "0000000000000000000000000000000000000", "tableName": null,"Games__": [{"columnName": "0_TESTFAM_TESTSKU_1.5", "columnValue": {"bytes": "ame": "hwty", "columnValue": "stringvalue"}, {"columnName": "loc", "columnValue": "stringvalue"}, {"columnName": "osrev", "columnValue": "stringvalue"}, {"columnName": "tz", "columnValue": "stringvalue"}], "PlayerState__": [{"columnName":"0_TESTFAM_TESTSKU_1.0=GC=2010:01:01:07", "columnValue": "{"mojo":10,"afloat":1.99,"hat":"red"}", "timestamp": 123456789}, {"columnName":"0_TESTFAM_TESTSKU_1.0=GS=2010:01:01:07", "columnValue": "{"mojo":10,"afloat":1.99,"hat":"red"}", "timestamp": 123456799}], "ClientSessions__": null, "ServerSessions__": null, "Monetization__": null}
2011-03-03 18:00:00,208 INFO org.apache.hadoop.mapred.Merger: trace bug adjust priority queue
2011-03-03 18:00:00,208 INFO org.apache.avro.mapred.AvroSerialization: trace bug deserialize() reader org.apache.avro.specific.SpecificDatumReader@1a001ff
2011-03-03 18:00:00,208 INFO org.apache.avro.generic.GenericDatumReader: trace bug type of expected STRING
2011-03-03 18:00:00,209 INFO org.apache.avro.mapred.AvroSerialization: trace bug deserialized datum 0000000100000000000000000000000000001
2011-03-03 18:00:00,209 INFO org.apache.hadoop.mapred.TaskRunner: trace bug1 deserializer is org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer@1abcc03
2011-03-03 18:00:00,209 INFO org.apache.hadoop.mapred.TaskRunner: trace bug1 key is 0000000100000000000000000000000000001
2011-03-03 18:00:00,210 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000000000000000000000000000000000 whose rowKey is 0000000000000000000000000000000000000
2011-03-03 18:00:00,215 INFO org.apache.hadoop.mapred.ReduceTask: trace bug call nextKey()
2011-03-03 18:00:00,215 INFO org.apache.hadoop.mapred.ReduceTask: trace bug key is 0000000100000000000000000000000000001 values is org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator@1deeb40
2011-03-03 18:00:00,215 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: work on key 0000000100000000000000000000000000001
2011-03-03 18:00:00,216 IN