Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Map output records/reducer input records mismatch


Copy link to this message
-
Re: Map output records/reducer input records mismatch
One more update:

running the job with the -XX:-UseLoopPredicate option gave the same results. The difference between mapper output records and reducer input records is persistent.

Thanks!

Vyacheslav

On Aug 17, 2011, at 3:56 AM, Scott Carey wrote:

> On 8/16/11 3:56 PM, "Vyacheslav Zholudev" <[EMAIL PROTECTED]>
> wrote:
>
>> Hi, Scott,
>>
>> thanks for your reply.
>>
>>> What Avro version is this happening with? What JVM version?
>>
>> We are using Avro 1.5.1 and Sun JDK 6, but the exact version I will have
>> to look up.
>>
>>>
>>> On a hunch, have you tried adding -XX:-UseLoopPredicate to the JVM args
>>> if
>>> it is Sun and JRE 6u21 or later? (some issues in loop predicates affect
>>> Java 6 too, just not as many as the recent news on Java7).
>>>
>>> Otherwise, it may likely be the same thing as AVRO-782.  Any extra
>>> information related to that issue would be welcome.
>>
>> I will have to collect it. In the meanwhile, do you have any reasonable
>> explanations of the issue besides it being something like AVRO-782?
>
> What is your key type (map output schema, first type argument of Pair)?
> Is your key a Utf8 or String?  I don't have a reasonable explanation at
> this point, I haven't looked into it in depth with a good reproducible
> case.  I have my suspicions with how recycling of the key works since Utf8
> is mutable and its backing byte[] can end up shared.
>
>
>
>>
>> Thanks a lot,
>> Vyacheslav
>>
>>>
>>> Thanks!
>>>
>>> -Scott
>>>
>>>
>>>
>>> On 8/16/11 8:39 AM, "Vyacheslav Zholudev"
>>> <[EMAIL PROTECTED]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm having multiple hadoop jobs that use the avro mapred API.
>>>> Only in one of the jobs I have a visible mismatch between a number of
>>>> map
>>>> output records and reducer input records.
>>>>
>>>> Does anybody encountered such a behavior? Can anybody think of possible
>>>> explanations of this phenomenon?
>>>>
>>>> Any pointers/thoughts are highly appreciated!
>>>>
>>>> Best,
>>>> Vyacheslav
>>>
>>>
>>
>> Best,
>> Vyacheslav
>>
>>
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB