Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> is this a bug?


Copy link to this message
-
RE: is this a bug?

I have created a Unit test that can reproduce the problem.  Do you want me to file a bug for this?  Thanks.
Ey-Chih Chow

From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Date: Thu, 10 Mar 2011 16:01:39 -0800
Subject: Re: is this a bug?
One thing that could be related is that Hadoop under the covers re-uses objects, so modifying one returned by reduce() and passing it on may not behave as expected.   Your work-around below seems to indicate that that it may be related to object re-use.
The easiest way for us to figure this out is to have a reproducible use case.   If you can provide a patch that adds a Unit test to Avro that fails, that would help greatly.   The Unit test would probably be the easiest on your end, since we already have M/R Unit tests in Avro that do most of the work of configuring and running a simple M/R job.
-Scott
On 3/10/11 3:16 PM, "ey-chih chow" <[EMAIL PROTECTED]> wrote:

After I made the change mentioned in the previous message, The MR job was running.  However, this did not fix the problem I mentioned at the beginning of the topic.  I got the following for the reducer:
====================================================================================================attempt_20110310145147365_0002_r_000000_0/syslog:2011-03-10 14:52:31,226 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000000000000000000000000000000000 whose rowKey is 0000000000000000000000000000000000000attempt_20110310145315542_0002_r_000000_0/syslog:2011-03-10 14:53:59,010 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000000000000000000000000000000000 whose rowKey is 0000000000000000000000000000000000000attempt_20110310145315542_0002_r_000000_0/syslog:2011-03-10 14:53:59,016 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000100000000000000000000000000001 whose rowKey is 0000000200000000000000000000000000002attempt_20110310145315542_0002_r_000000_0/syslog:2011-03-10 14:53:59,017 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000200000000000000000000000000002 whose rowKey is 0000000300000000000000000000000000003attempt_20110310145315542_0002_r_000000_0/syslog:2011-03-10 14:53:59,021 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000300000000000000000000000000003 whose rowKey is 0000000400000000000000000000000000004attempt_20110310145315542_0002_r_000000_0/syslog:2011-03-10 14:53:59,023 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000400000000000000000000000000004 whose rowKey is 0000000500000000000000000000000000005attempt_20110310145315542_0002_r_000000_0/syslog:2011-03-10 14:53:59,024 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000500000000000000000000000000005 whose rowKey is 0000000500000000000000000000000000005===================================================================================================If we add the following two lines to the reducer code:
====================================================================================================boolean workAround = getConf().getBoolean(NgActivityGatheringJob.NG_AVRO_BUG_WORKAROUND, true);Utf8 dupKey = (workAround) ? new Utf8(key.toString()) : key; // use dupKey instead of key passed to reducer===================================================================================================We got the following trace, which we consider as the right behavior:
====================================================================================================2011-03-10 15:04:33,431 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000000000000000000000000000000000 whose rowKey is 0000000000000000000000000000000000000attempt_20110310150517897_0002_r_000000_0/syslog:2011-03-10 15:06:01,374 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000000000000000000000000000000000 whose rowKey is 0000000000000000000000000000000000000attempt_20110310150517897_0002_r_000000_0/syslog:2011-03-10 15:06:01,381 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000100000000000000000000000000001 whose rowKey is 0000000100000000000000000000000000001attempt_20110310150517897_0002_r_000000_0/syslog:2011-03-10 15:06:01,383 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000200000000000000000000000000002 whose rowKey is 0000000200000000000000000000000000002attempt_20110310150517897_0002_r_000000_0/syslog:2011-03-10 15:06:01,389 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000300000000000000000000000000003 whose rowKey is 0000000300000000000000000000000000003attempt_20110310150517897_0002_r_000000_0/syslog:2011-03-10 15:06:01,391 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000400000000000000000000000000004 whose rowKey is 0000000400000000000000000000000000004attempt_20110310150517897_0002_r_000000_0/syslog:2011-03-10 15:06:01,393 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000500000000000000000000000000005 whose rowKey is 0000000500000000000000000000000000005===================================================================================================
Ey-Chih Chow
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: RE: is this a bug?
Date: Thu, 10 Mar 2011 14:28:41 -0800
I changed the Games__ field of the DeviceRow to
union {null, array<DynamicColumn4Games>} Games__;
the system seemed no longer complaining.  Is this a right fix?  Thanks.
Ey-Chih Chow

From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: RE: is this a bug?
Date: Thu, 10 Mar 2011 11:33:13 -0800
Thanks.  I tried to migrate to 1.5.0 from 1.4.0.  I came up with some error messages that are never shown up for 1.4.0.  Could you tell me what we should change?  Our avdl record, DeviceRow, has a field defined as follows:

union {array<DynamicColumn4Games>,
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB